Closed fredRos closed 7 years ago
Oh and the disagreement is stronger with increasing image size. Either because prob. of mismatch is constant and we just have more trials, or it is really due to summing more terms in a different way
I am looking into this right now. I did not realize that sample() and chi2() tests were not passing already at the commit of 19 days ago. I am puzzled from the fact that all the steps pass if tested individually, but sample() and chi2() - which do all the steps - fail.
I think the difference is really due to shift FFT shift operation for single precision. Using atol just masks this problem because it allows tests to pass when numbers happen to be close to zero and if subsequent operations move values away from zero, our stricter rtol causes the test to fail.
numpy and FFTW vary quite a bit in single precision and there is nothing we can do about it. We should recommend users to use double for scientific work. Single precision at your own risk!
def test_shift_fft_shift(size, complex_type, rtol, atol, acc_lib):
reference_image = create_reference_image(size=size, kernel='gaussian', dtype=complex_type)
cpu_shift_fft_shift = np.fft.fftshift(np.fft.fft2(np.fft.fftshift(reference_image)))
# create a copy of reference_image because galario makes in-place FFT
ref_complex = reference_image.copy()
acc_lib.fftshift_fft2d_fftshift(ref_complex)
print()
print('%.15e' % cpu_shift_fft_shift.real[0,0])
print('%.15e' % ref_complex.real[0,0])
print('---')
Let's test for different image sizes
# single precision
-1.599041643203236e-03
-1.560211181640625e-03
---
.
# double
-1.599940267100308e-03
-1.599940267100308e-03
---
So in double precision, all digits agree but in single precision, only 2 digits match. This relative mismatch of ~2.5 % gets propagated. Numpy's single fft is good to 3 digits vs double. I know that shift just copies values so the question is why FFTW (and cuFFT) are so far off in single precision.
1.779161626470635e-03
1.781748607754707e-03
---
.
1.780879036946570e-03
1.780879036946813e-03
Now numpy a bit more off in single. Double results not identical.
3.309356397994316e-04
3.327933081891388e-04
---
F
3.327185188833359e-04
3.327185189048832e-04
And now numpy is far off in single.
I decided to look into the loss of precision again. Remember I told you about the summation algorithms I looked at this week. I compared single and double precision on CPU and GPU to the python reference implementation. Quite possibly we had those numbers before but I just did it again. here I printed the first 5 element of the interpolated values output by
acc_lib.sample
CPU
GPU
It is clear that there is a big loss in precision for galario single, the other three implementations agree better. That loss is the same for GPU or CPU. So there must be a problem in galario with the algorithm itself.
To find out where the precision is lost, I just checked after every operation in
test_loss()
intest_galario.py.
The output of fft2 and cuFFT and FFTW were only within 1e-3 relative precision for single, and this is about the first operation on the image, so I think this is where the root of the problem is. I know that pyfftw does a lot of comparisons again numpy FFT, so they should agree. But they seem to differ for results close to zero. If there are terms of varying signs and magnitudes, this can happen with naive summation. I don't know what's used in np.fftw but I know that FFTW uses pairwise summation which should do OK except in pathological cases that I hope/guess are not present in our test case.
I updated many unit tests to use both rtol and atol. This showed that the relative precision is OK unless numbers are close to zero, where
atol
comes into play.What totally puzzles me is that if I do all steps individually, the CPU/GPU results agree with pyvfit up until after the interpolation but if I call the GPU version that does everything combined, the results are at much lower precision. I played around and found that mismatch ratio is 55 % for the phases in
par1
but setting the phases to zero or multiplying by 10 gave mismatches close to zero. Does that mean the phase is not correctly taken into account in on of the implementations? Or do we just see come loss of precision for certain values of the phase? hmmm