xtensor-stack / xtensor

C++ tensors with broadcasting and lazy computing
BSD 3-Clause "New" or "Revised" License
3.37k stars 399 forks source link

Performance issue possibly related to returning C-style adaptor from function #2780

Open foolnotion opened 7 months ago

foolnotion commented 7 months ago

Hi,

I believe my issue is similar to https://github.com/xtensor-stack/xtensor/issues/600, but that issue is 7 years old and the solution no longer applies.

I have a thin abstraction layer in my library which allows me to use different math backends (Xtensor, Eigen, Armadillo, etc.). This depends on being able to return views/maps from raw pointers. As per the documentation, the following function maps a C-style 1D array to a tensor:

template<typename T, std::size_t S>
inline auto Map(T* res) {
    auto a = xt::adapt(res, S, xt::no_ownership(), std::array{S});
    return a;
}

This is later used like this:

template<typename T, std::size_t S>
auto Add(T* res, auto const*... args) {
    Map<T, S>(res) = (Map<T const, S>(args) + ...);
}

However, the problem is that this code is ~2.5 times slower than Eigen and my suspicion is that the data is actually being copied.

I've also investigated other causes like a lack of optimizations but:

Any help fixing the performance issue would be greatly appreciated, thanks.

Arktische commented 3 months ago

You can try to remove useless assignment in Map, just return xt::adapt(…) directly, that returns a xexpression and easy for compiler to apply RVO. NRVO sometimes can’t be applied.