Performance issue possibly related to returning C-style adaptor from function

Hi,

I believe my issue is similar to https://github.com/xtensor-stack/xtensor/issues/600, but that issue is 7 years old and the solution no longer applies.

I have a thin abstraction layer in my library which allows me to use different math backends (Xtensor, Eigen, Armadillo, etc.). This depends on being able to return views/maps from raw pointers. As per the documentation, the following function maps a C-style 1D array to a tensor:

template<typename T, std::size_t S>
inline auto Map(T* res) {
    auto a = xt::adapt(res, S, xt::no_ownership(), std::array{S});
    return a;
}

This is later used like this:

template<typename T, std::size_t S>
auto Add(T* res, auto const*... args) {
    Map<T, S>(res) = (Map<T const, S>(args) + ...);
}

However, the problem is that this code is ~2.5 times slower than Eigen and my suspicion is that the data is actually being copied.

I've also investigated other causes like a lack of optimizations but:

the code is compiled with -O3 -march=x86-64-v3 (which includes avx2)
xsimd is installed and XTENSOR_USE_XSIMD is defined

Any help fixing the performance issue would be greatly appreciated, thanks.

xtensor-stack / xtensor

Performance issue possibly related to returning C-style adaptor from function #2780