mreineck / pocketfft

Fork of https://gitlab.mpcdf.mpg.de/mtr/pocketfft to simplify external contributions
BSD 3-Clause "New" or "Revised" License
71 stars 34 forks source link

good_size_cmplx() and the like must handle overflow #15

Closed crisluengo closed 1 month ago

crisluengo commented 2 months ago

Now, this is unlikely to be a problem on 64-bit architectures, because nobody is going to compute an FFT anywhere near the limit for those lengths. But I've run into this on a 32-bit machine.

If the input n is larger than SIZE_MAX / 2, then bestfac=2*n will overflow and the output will be a very small number.

If the input n is smaller than SIZE_MAX / 2, but sufficiently large that f11*=11 can overflow, then the loop for (size_t f11=1; f11<bestfac; f11*=11) will never end. Likewise for the other 4 loops in that function. The other functions, including the new prev_good_size_cmplx() and prev_good_size_real() have a similar issue.

I've been struggling to write a correct implementation. I still see a different output on 32-bit systems and 64-bit systems for an input such as n = 2109375001, which indicates there's still an overflow issue in the 32-bit case. You can watch my progress here: https://github.com/DIPlib/diplib/blob/master/src/transform/dft_optimal_size.cpp Once I figure out how to do this correctly, I will submit a PR to fix the code here too.

mreineck commented 2 months ago

I totally agree that we should guard against overflow here, but I wonder whether we can't do this by simply rejecting FFT lengths larger than, say, 2**19. I'd argue that for carrying out an FFT of length N, at least 8N bytes of memory are required: 4N for the input array (assuming 32bit floats), 4N for the internal copy, and then we also have twiddle factors etc. On a 32bit system a single process can see 4GB of memory at most (or is this a wrong assumption these days?), so any FFT longer than 2**19 will definitely crash by running out of memory. Am I missing something?

crisluengo commented 2 months ago

That is a really good point.

I guess you meant 2**29, which is 2**32 / 8.

The algorithm should never overflow, if I’m not mistaken, for numbers up to 2**32 / 11 / 2 - 1 (but I need to examine the code again to be sure, I’m doing this from memory right now). The real version doesn’t multiply by 11, so should have a larger limit.

mreineck commented 2 months ago

Sorry, must have slipped on the keyboard when typing the 2**19 :-)

How about a pragmatic solution: we turn the good_size functions into templates that work for uint32_t and uint64_t (should cover size_t for 32bit and 64bit systems). Whenever the 32bit version notices that it can't cope with the input, it will let the 64bit version do the job, and when the returned result is too large for a 32bit machine, an exception is thrown.

crisluengo commented 2 months ago

I think that’s the ticket.

I assume 64-bit math is more expensive than 32-bit math on a 32-bit architecture, so it’s worth while using the 32-bit version of the code when possible. This is a really good solution!

mreineck commented 1 month ago

fixed via #17