mreineck / pocketfft

Fork of https://gitlab.mpcdf.mpg.de/mtr/pocketfft to simplify external contributions
BSD 3-Clause "New" or "Revised" License
71 stars 34 forks source link

Good aligned size #12

Closed dcmvdbekerom closed 3 months ago

dcmvdbekerom commented 3 months ago

Hi there,

Thanks for this very useful FFT library! For my application I need to find an FFT length that is both fast and SIMD aligned. I managed to do this by making the following modifications to the header file:

Line 405: static POCKETFFT_NOINLINE size_t good_size_cmplx(size_t n, int align=1)
Line 422:              if ((x<bestfac)&&(x%align==0)) bestfac=x;

If speed would be too much impacted by the modulo operation, one could also only check powers of 2, i.e.:

Line 405: static POCKETFFT_NOINLINE size_t good_size_cmplx(size_t n, int align=0)
Line 422:              if ((x<bestfac)&&(x&((1<<align)-1)) bestfac=x;

If you think this could be useful to others as well, feel free to add this to the header file.

mreineck commented 3 months ago

Thanks a lot, that is an interesting feature! I have come across the need for something like this in a slightly different context; I had the requirement "the result size must be even", not so much "the result size must not break an alignment of x", but de facto there isn't a real difference between these two.

I wonder if this can be implemented a little bit more efficiently, by doing

Line 405: static POCKETFFT_NOINLINE size_t good_size_cmplx(size_t n, size_t required_factor=1)
Line 410:   for (size_t f11=required_factor; f11<bestfac; f11*=11)

(plus some small adjustments in lines 407 and 409 to make sure we don't get into endless loops.)

Would this also work for you?

dcmvdbekerom commented 3 months ago

Thanks for the quick reply, yes this solution would work for me too!

mreineck commented 3 months ago

Great! Can you please give the enhanced_good_size branch a try?

mreineck commented 3 months ago

Ah, sorry! I think this implementation is still buggy. Hopefully I'll have something better soon!

mreineck commented 3 months ago

Now it should hopefully be correct.

dcmvdbekerom commented 3 months ago

It works! Thanks again for the very quick response 😄!

mreineck commented 3 months ago

Merged!