row major xtensor assignment is so slow

xtensor-stack / xtensor

C++ tensors with broadcasting and lazy computing

BSD 3-Clause "New" or "Revised" License

3.37k stars 399 forks source link

row major xtensor assignment is so slow #2749

Open argman opened 1 year ago

argman commented 1 year ago

using tensor_type_row = xt::xtensor<float, 4, xt::layout_type::row_major>;
using tensor_type_col = xt::xtensor<float, 4, xt::layout_type::column_major>;
auto tmp_a = tensor_type_col({200, 512, 512, 5});
tensor_type_col tmp_b;
tmp_b = xt::view(tmp_a, xt::all(), xt::all(), xt::all(), xt::range(1, 2));

this demo code cost about 400ms if i use row_major, 100ms of column major

how can i speed up this ?

its even slow than using for loop with openmp enabled.

spectre-ns commented 12 months ago

I'm pretty sure this is slow because the column major xtensor will be strided in memory between elements. This will always be slow. I haven't run the assignment loop in this case but it could very well be using the stepper assignment method which won't be improved with openmp.

tdegeus commented 12 months ago

Could we quantify the performance by writing an explicit assignment in a loop (making use of the strides)?