Use xexpression as input for fft functions

martinRenou commented 6 years ago

I don't know if it makes sense and if it's easy to achieve it, but it would be nice to not have to evaluate the xarrays for a call of fft function, meaning that the implementation of the functions would be:

template <class T>
inline xt::xarray<std::complex<float> > fft3 (const xt::xexpression<T> &input) {
      return ...;
    }

instead of

inline xt::xarray<std::complex<float> > fft3 (const xt::xarray<std::complex<float> > &input) {
      return ...;
    }

And maybe the return type could be an xexpression too..

marty1885 commented 6 years ago

I think I can implement that. But xtensor-fftw calls fftw to perform the FFT functionality. Which means the inputs has to be evaluated before calling fftw and the result is/has to be an xarray.

@martinRenou Are there any reason to accept an xexpression as input?

martinRenou commented 6 years ago

Ok I see your point. There would be some benefits of using an xexpression as input if there are other operations on the input before calling fftw. We just need to be sure that it's evaluated only once.

egpbos commented 6 years ago

It should be possible to postpone evaluation by using something derived from xfunction, if I understand correctly. I started reading up on this in the xtensor docs dev section, but got distracted by some other urgent projects. If you would like to take this up @marty1885, that would be great!

marty1885 commented 6 years ago

@egpbos I'll take on this.

May you shed some light on why accepting xexpressions are benificial? It will help me on getting it workign a lot. Evaluating xfunction and xexpressions multiple times leads to redundant evaluation. And xtensor-fftw needs to evaluate any input into an xarray before calling fftw. Thus the extra evaluation by accepting xexpression as input won't disappear. How is postponing the evaluation helping?

martinRenou commented 6 years ago

@egpbos an xfunction is an xexpression so taking an xexpression as input will allow you to use xfunctions as well.

And xtensor-fftw needs to evaluate any input into an xarray before calling fftw

I don't know the code base of xtensor-fftw, I just expected it to be faster taking an xexpression as input, it depends what you do with it. Basically, if all of the functions in xtensor-fftw are like that:

template <class T>
xarray<T> xtensor_fftw_function(const xarray<T>& arr)
{
    return fftw_function(arr.cbegin(), arr.cend());
}

Then you won't really benefit from taking an xexpression as input, because it will be evaluated once anyway. But in this case you would maybe benefit from taking an xtensor_fixed as input if you know exactly what shape is expected for fftw_function.

In the case of having functions like this one: (perform some operations using xtensor before calling the fftw_function)

template <class T>
xarray<T> xtensor_fftw_function(const xarray<T>& arr1, const xarray<T>& arr2)
{
    xarray<T> sum = arr1 + arr2;
    return fftw_function(sum.cbegin(), sum.cend());
}

Then you would benefit from using xexpressions as input:

template <class T1, class T2>
auto xtensor_fftw_function(const xexpression<T1>& e1, const xexpression<T2>& e2) // Inputs won't be evaluated
{
    auto sum = arr1 + arr2; // This won't be evaluated (here it is an xfunction)
    return fftw_function(sum.cbegin(), sum.cend()); // It will be evaluated just once here
}

Maybe it is not relevant in the case of xtensor-fftw, I'll take a deeper look in the code base. Thank you @marty1885 for taking a look at it! I can help if need be. Also @wolfv please correct me if I'm wrong here.

marty1885 commented 6 years ago

@martinRenou I have gone trough most of xtensor-fftw's code base. xtensor-fftw is only using xarray as containers and does not operate on them (Besides a few places scaling the IFFT result coming out of FFTW. There is no way around that). Replacing xarrays with xexpressions won't be hard. But in the same time, there's no runtime benefits doing so and introduces extra template initialization.

Could replacing xarray can be an anti-optimization?. I haven't measure the compile time and it's purely my guess. What do you think?

egpbos commented 6 years ago

Would be good to figure out... But there's also the third time dimension of programmer/maintenance time :) I had in my mind that switching to xexpressions would make things easier to maintain, since we won't have to duplicate code to also support xtensors (instead of just xarrays). Maybe I'm wrong though.

martinRenou commented 6 years ago

I think replacing xarray be an anti-optimization by increasing compile time without runtime benefits. Just my guesses tho. Haven't measure it. What do you think?

I think you are right.

since we won't have to duplicate code to also support xtensors

You can assign an xtensor to an xarray (if the dimension matches).

So, I guess our only way to optimize the code would be to use xtensor_fixed? Or at least xtensor? But the question is: can we know the dimension or the shape at compile time? Do we know what shape is expected as input for fftw? https://xtensor.readthedocs.io/en/latest/container.html#runtime-vs-compile-time-dimensionality

egpbos commented 6 years ago

I don't see problems with that. They all have a shape() method, so I think it should work almost out of the box, though you could do optimizations with specialized templates that make use of the compile time availability maybe.

martinRenou commented 6 years ago

Well, having a shape() method doesn't give you the knowledge of the shape at compile time. An xarray has a shape() method but you don't know the shape at compile time. It's like the difference between std::vector and std::array, std::array is faster because it's a fixed size, known at compile time (it is part of the type, e.g. std::array<int, 2>), whereas std::vector has a dynamic size and it's slower because it allocates memory dynamically.

So, the thing is that:

if you know what shape is expected as input for the fftw function (say (3, 3, 3)), we can use an xt::xtensor_fixed<double, xt::xshape<3, 3, 3>> as input
if you know what dimension is expected as input for the fftw function (say 3), we can use an xt::xtensor<double, 3> as input And it will be faster than xarray which has a dynamic shape.

egpbos commented 6 years ago

Right. What I was referring to is that the current implementation uses shape() to determine the output xarray shapes. So then should this be changed somehow, perhaps by putting the shape information in the template parameters?

martinRenou commented 6 years ago

Oh ok I see. Yeah, that would work I guess!

martinRenou commented 6 years ago

My bad, after discussing with wolf, this would not work. We can't have a function like:

template <class S>
void foo(const xtensor_fixed<double, S>& a)
{...}

It would only work with xtensor_fixed as input. If we pass as input, say an xfunction, it would not compile.

Wolf's idea is that we should take an xexpression as input, call xt::eval on it just before calling the fftw function. And the output value can be the output of xt::empty_like. That way the user could pass an xtensor_fixed if he cares about performances and knows the shape of his tensors.

egpbos commented 6 years ago

Well, xt::empty_like could work for the generic fft case, since then the output and input are exactly the same shape, but in the case of rfft and hfft, both the type (complex vs double) and the shape ((a, b, c) vs (a, b, c/2+1)) will differ. We can maybe make a constexpr shape calculation? It will probably be a horrible beast... especially because we need branching for odd-sized shapes and constexpr-if is C++17. But sounds like a fun exercise! :D

marty1885 commented 6 years ago

By the way. How effective is xtensor_fixed? Out of curiosity. I compiled a simple program on Compiler Explorer using clang trunk with -O3 -DNDEBUG -std=c++14 -fno-exceptions. And it gave me absolutely huge about of assembly instead of mov eax, 25; ret.

#include <xtensor/xfixed.hpp>
int main() {
    xt::xtensor_fixed<float, xt::xshape<5, 5>> v = xt::ones<int>({5,5});
    return xt::sum(v)[0];
}

Edit: Oh, never mind. Its the reducer's fault. Why is summing a 2D matrix so expensive?

martinRenou commented 6 years ago

I had to ask Wolf for that question :P xtensor_fixed should be faster, we didn't make benchmarks to compare it to xtensor and xarray, but it should be simple to do.

Concerning your code, maybe try not to mix float and int types, Wolf came up with this code which should be equivalent to yours but it generates a lot less assembly.

#include <xtensor/xfixed.hpp>
#include <xtensor/xtensor.hpp>

int main() {
    xt::xtensor_fixed<int, xt::xshape<5, 5>> v(xt::xshape<5, 5>{}, 1);
    return xt::sum<int>(v, xt::xshape<0, 1>{}, xt::evaluation_strategy::immediate{})(0);
}

xtensor-stack / xtensor-fftw

Use xexpression as input for fft functions #37