Open argman opened 1 year ago
I'm pretty sure this is slow because the column major xtensor will be strided in memory between elements. This will always be slow. I haven't run the assignment loop in this case but it could very well be using the stepper assignment method which won't be improved with openmp.
Could we quantify the performance by writing an explicit assignment in a loop (making use of the strides)?
this demo code cost about 400ms if i use row_major, 100ms of column major
how can i speed up this ?
its even slow than using for loop with openmp enabled.