Fastor outperform the other inner_product implementation except with small kernel size and I'm sure that I don't use Fastor correctly. In the main processing loop (over the sample buffer), I can't call Fastor::inner directly with a subview like that:
where N is the templated FIR order, h is the FIR coefficients tensor of the impulse response, z a double-buffer state tensor related of the z-N essence of the FIR equation and buffer the sample buffer that receive the discrete convolution result.
I need to cast the subview like that to allow compilation:
Even if the method outperform the other method on kernel > 32 (in the benchmark of power of 2), I'm pretty sure that the assignment operator in the main loop is a bottleneck for smaller sizes kernels.
Why can I directly call Fastor::inner with the subview ? What is wrong with my code ?
Thank you very much for you answer and your time !!!
Hi Roman, sorry to ask you something maybe naïve, but I have add a little class to a FIR Benchmark produce by jatinchowdhury18 repo, you can find the issue here : New promising benchmark using Fastor C++
Fastor outperform the other
inner_product
implementation except with small kernel size and I'm sure that I don't use Fastor correctly. In the main processing loop (over the sample buffer), I can't callFastor::inner
directly with a subview like that:where
N
is the templated FIR order,h
is the FIR coefficients tensor of the impulse response,z
a double-buffer state tensor related of thez-N
essence of the FIR equation andbuffer
the sample buffer that receive the discrete convolution result.I need to cast the subview like that to allow compilation:
Even if the method outperform the other method on kernel > 32 (in the benchmark of power of 2), I'm pretty sure that the assignment operator in the main loop is a bottleneck for smaller sizes kernels.
Why can I directly call
Fastor::inner
with the subview ? What is wrong with my code ?Thank you very much for you answer and your time !!!