Open martinRenou opened 6 years ago
I think I can implement that. But xtensor-fftw calls fftw
to perform the FFT functionality. Which means the inputs has to be evaluated before calling fftw
and the result is/has to be an xarray.
@martinRenou Are there any reason to accept an xexpression as input?
Ok I see your point. There would be some benefits of using an xexpression as input if there are other operations on the input before calling fftw
. We just need to be sure that it's evaluated only once.
It should be possible to postpone evaluation by using something derived from xfunction
, if I understand correctly. I started reading up on this in the xtensor docs dev section, but got distracted by some other urgent projects. If you would like to take this up @marty1885, that would be great!
@egpbos I'll take on this.
May you shed some light on why accepting xexpressions are benificial? It will help me on getting it workign a lot. Evaluating xfunction
and xexpressions
multiple times leads to redundant evaluation. And xtensor-fftw needs to evaluate any input into an xarray before calling fftw. Thus the extra evaluation by accepting xexpression as input won't disappear. How is postponing the evaluation helping?
@egpbos an xfunction is an xexpression
so taking an xexpression
as input will allow you to use xfunction
s as well.
And xtensor-fftw needs to evaluate any input into an xarray before calling fftw
I don't know the code base of xtensor-fftw
, I just expected it to be faster taking an xexpression
as input, it depends what you do with it. Basically, if all of the functions in xtensor-fftw
are like that:
template <class T>
xarray<T> xtensor_fftw_function(const xarray<T>& arr)
{
return fftw_function(arr.cbegin(), arr.cend());
}
Then you won't really benefit from taking an xexpression
as input, because it will be evaluated once anyway. But in this case you would maybe benefit from taking an xtensor_fixed
as input if you know exactly what shape is expected for fftw_function
.
In the case of having functions like this one: (perform some operations using xtensor before calling the fftw_function
)
template <class T>
xarray<T> xtensor_fftw_function(const xarray<T>& arr1, const xarray<T>& arr2)
{
xarray<T> sum = arr1 + arr2;
return fftw_function(sum.cbegin(), sum.cend());
}
Then you would benefit from using xexpression
s as input:
template <class T1, class T2>
auto xtensor_fftw_function(const xexpression<T1>& e1, const xexpression<T2>& e2) // Inputs won't be evaluated
{
auto sum = arr1 + arr2; // This won't be evaluated (here it is an xfunction)
return fftw_function(sum.cbegin(), sum.cend()); // It will be evaluated just once here
}
Maybe it is not relevant in the case of xtensor-fftw
, I'll take a deeper look in the code base.
Thank you @marty1885 for taking a look at it! I can help if need be. Also @wolfv please correct me if I'm wrong here.
@martinRenou
I have gone trough most of xtensor-fftw's code base. xtensor-fftw is only using xarray
as containers and does not operate on them (Besides a few places scaling the IFFT result coming out of FFTW. There is no way around that). Replacing xarrays with xexpressions won't be hard. But in the same time, there's no runtime benefits doing so and introduces extra template initialization.
Could replacing xarray can be an anti-optimization?. I haven't measure the compile time and it's purely my guess. What do you think?
Would be good to figure out... But there's also the third time dimension of programmer/maintenance time :) I had in my mind that switching to xexpressions would make things easier to maintain, since we won't have to duplicate code to also support xtensors (instead of just xarrays). Maybe I'm wrong though.
I think replacing xarray be an anti-optimization by increasing compile time without runtime benefits. Just my guesses tho. Haven't measure it. What do you think?
I think you are right.
since we won't have to duplicate code to also support xtensors
You can assign an xtensor to an xarray (if the dimension matches).
So, I guess our only way to optimize the code would be to use xtensor_fixed
? Or at least xtensor
? But the question is: can we know the dimension or the shape at compile time? Do we know what shape is expected as input for fftw?
https://xtensor.readthedocs.io/en/latest/container.html#runtime-vs-compile-time-dimensionality
I don't see problems with that. They all have a shape()
method, so I think it should work almost out of the box, though you could do optimizations with specialized templates that make use of the compile time availability maybe.
Well, having a shape()
method doesn't give you the knowledge of the shape at compile time. An xarray
has a shape()
method but you don't know the shape at compile time.
It's like the difference between std::vector
and std::array
, std::array
is faster because it's a fixed size, known at compile time (it is part of the type, e.g. std::array<int, 2>
), whereas std::vector
has a dynamic size and it's slower because it allocates memory dynamically.
So, the thing is that:
(3, 3, 3)
), we can use an xt::xtensor_fixed<double, xt::xshape<3, 3, 3>>
as input3
), we can use an xt::xtensor<double, 3>
as input
And it will be faster than xarray
which has a dynamic shape.Right. What I was referring to is that the current implementation uses shape()
to determine the output xarray shapes. So then should this be changed somehow, perhaps by putting the shape information in the template parameters?
Oh ok I see. Yeah, that would work I guess!
My bad, after discussing with wolf, this would not work. We can't have a function like:
template <class S>
void foo(const xtensor_fixed<double, S>& a)
{...}
It would only work with xtensor_fixed
as input. If we pass as input, say an xfunction
, it would not compile.
Wolf's idea is that we should take an xexpression
as input, call xt::eval
on it just before calling the fftw function. And the output value can be the output of xt::empty_like
. That way the user could pass an xtensor_fixed
if he cares about performances and knows the shape of his tensors.
Well, xt::empty_like
could work for the generic fft
case, since then the output and input are exactly the same shape, but in the case of rfft
and hfft
, both the type (complex vs double) and the shape ((a, b, c) vs (a, b, c/2+1)) will differ. We can maybe make a constexpr
shape calculation? It will probably be a horrible beast... especially because we need branching for odd-sized shapes and constexpr-if
is C++17. But sounds like a fun exercise! :D
By the way. How effective is xtensor_fixed?
Out of curiosity. I compiled a simple program on Compiler Explorer using clang trunk with -O3 -DNDEBUG -std=c++14 -fno-exceptions
. And it gave me absolutely huge about of assembly instead of mov eax, 25; ret
.
#include <xtensor/xfixed.hpp>
int main() {
xt::xtensor_fixed<float, xt::xshape<5, 5>> v = xt::ones<int>({5,5});
return xt::sum(v)[0];
}
Edit: Oh, never mind. Its the reducer's fault. Why is summing a 2D matrix so expensive?
I had to ask Wolf for that question :P
xtensor_fixed
should be faster, we didn't make benchmarks to compare it to xtensor
and xarray
, but it should be simple to do.
Concerning your code, maybe try not to mix float
and int
types, Wolf came up with this code which should be equivalent to yours but it generates a lot less assembly.
#include <xtensor/xfixed.hpp>
#include <xtensor/xtensor.hpp>
int main() {
xt::xtensor_fixed<int, xt::xshape<5, 5>> v(xt::xshape<5, 5>{}, 1);
return xt::sum<int>(v, xt::xshape<0, 1>{}, xt::evaluation_strategy::immediate{})(0);
}
I don't know if it makes sense and if it's easy to achieve it, but it would be nice to not have to evaluate the xarrays for a call of fft function, meaning that the implementation of the functions would be:
instead of
And maybe the return type could be an xexpression too..