xtensor-stack / xtensor

C++ tensors with broadcasting and lazy computing
BSD 3-Clause "New" or "Revised" License
3.36k stars 398 forks source link

Adding the ability to enable memory overlap check in assignment to avoid unneeded temporary memory allocation #2768

Closed vakokako closed 8 months ago

vakokako commented 10 months ago

Currently the default behaviour of xsemantic_base<D>::operator=(const xexpression<E>& e) is to create a temporary copy of e to avoid any problems with overlapping memory. However, if the user doesn't use some obscure strides for the containers, a check if the pointers from begin and end elements don't overlap is sufficient. Based on our benchmarks, depending on the size of the containers brings huge speedups for some use cases (up to 90% in case of views). This is also achievable with noalias, but it's hard to enforce its usage and adding a compilation flag XTENSOR_FORCE_TEMPORARY_MEMORY_IN_ASSIGNMENTS that allows changing this behaviour is highly beneficial.

I'm also attaching the benchmark results we obtained: xtensor_assign_benchmark_results.ods

Explanation behind benchmark names. Benchmark was written as single templated function, so based on the template parameters in the benchmark name, the benchmarked operation looked as following, where both containers are 2d, with last number parameter being the size of single dimension: ```cpp DstOperation(DstContainer) = SrcOperation(xtensor<..., 2>) ```
vakokako commented 9 months ago

Thanks for your feedback, I made the changes.

Also, can you come up with use cases when this check would not work correctly? i.e. some elements in the container aren't located between the first and last element in memory?

vakokako commented 8 months ago

Hey, @JohanMabille, just reminding you that this exists))

JohanMabille commented 8 months ago

Thanks for doing it, I had totally forgotten it Oo

vakokako commented 8 months ago

Thanks!