mtazzari / galario

Gpu Accelerated Library for Analysing Radio Interferometer Observations
https://mtazzari.github.io/galario/
GNU Lesser General Public License v3.0
31 stars 15 forks source link

Loss of precision #28

Closed fredRos closed 7 years ago

fredRos commented 7 years ago

I decided to look into the loss of precision again. Remember I told you about the summation algorithms I looked at this week. I compared single and double precision on CPU and GPU to the python reference implementation. Quite possibly we had those numbers before but I just did it again. here I printed the first 5 element of the interpolated values output by acc_lib.sample

CPU

[  -7.32114697  117.65641022    0.5806728     3.80655694   23.42615891] # py single
[  -6.9562273   111.83618164    0.72458202    3.68281412   24.37336922] # galario single
[  -7.32083263  117.65531322    0.58068566    3.80660492   23.42551657] # py double
[  -7.32083263  117.65531324    0.58068566    3.80660492   23.42551656] # galario double

GPU

[  -7.32114697  117.65641022    0.5806728     3.80655694   23.42615891] # py single
[  -6.95623207  111.83618164    0.72458595    3.68281269   24.37336731] # galario single
[  -7.32083263  117.65531322    0.58068566    3.80660492   23.42551657] # py double
[  -7.32083263  117.65531324    0.58068566    3.80660492   23.42551656] # galario double

It is clear that there is a big loss in precision for galario single, the other three implementations agree better. That loss is the same for GPU or CPU. So there must be a problem in galario with the algorithm itself.

To find out where the precision is lost, I just checked after every operation in test_loss() in test_galario.py.

The output of fft2 and cuFFT and FFTW were only within 1e-3 relative precision for single, and this is about the first operation on the image, so I think this is where the root of the problem is. I know that pyfftw does a lot of comparisons again numpy FFT, so they should agree. But they seem to differ for results close to zero. If there are terms of varying signs and magnitudes, this can happen with naive summation. I don't know what's used in np.fftw but I know that FFTW uses pairwise summation which should do OK except in pathological cases that I hope/guess are not present in our test case.

I updated many unit tests to use both rtol and atol. This showed that the relative precision is OK unless numbers are close to zero, where atol comes into play.

What totally puzzles me is that if I do all steps individually, the CPU/GPU results agree with pyvfit up until after the interpolation but if I call the GPU version that does everything combined, the results are at much lower precision. I played around and found that mismatch ratio is 55 % for the phases in par1 but setting the phases to zero or multiplying by 10 gave mismatches close to zero. Does that mean the phase is not correctly taken into account in on of the implementations? Or do we just see come loss of precision for certain values of the phase? hmmm

fredRos commented 7 years ago

Oh and the disagreement is stronger with increasing image size. Either because prob. of mismatch is constant and we just have more trials, or it is really due to summing more terms in a different way

mtazzari commented 7 years ago

I am looking into this right now. I did not realize that sample() and chi2() tests were not passing already at the commit of 19 days ago. I am puzzled from the fact that all the steps pass if tested individually, but sample() and chi2() - which do all the steps - fail.

fredRos commented 7 years ago

I think the difference is really due to shift FFT shift operation for single precision. Using atol just masks this problem because it allows tests to pass when numbers happen to be close to zero and if subsequent operations move values away from zero, our stricter rtol causes the test to fail.

conclusion

numpy and FFTW vary quite a bit in single precision and there is nothing we can do about it. We should recommend users to use double for scientific work. Single precision at your own risk!

Investigation of discrepancies

def test_shift_fft_shift(size, complex_type, rtol, atol, acc_lib):

    reference_image = create_reference_image(size=size, kernel='gaussian', dtype=complex_type)
    cpu_shift_fft_shift = np.fft.fftshift(np.fft.fft2(np.fft.fftshift(reference_image)))

    # create a copy of reference_image because galario makes in-place FFT
    ref_complex = reference_image.copy()
    acc_lib.fftshift_fft2d_fftshift(ref_complex)

    print()
    print('%.15e' % cpu_shift_fft_shift.real[0,0])
    print('%.15e' % ref_complex.real[0,0])
    print('---')

Let's test for different image sizes

size = 1024

# single precision
-1.599041643203236e-03
-1.560211181640625e-03
---
.
# double
-1.599940267100308e-03
-1.599940267100308e-03
---

So in double precision, all digits agree but in single precision, only 2 digits match. This relative mismatch of ~2.5 % gets propagated. Numpy's single fft is good to 3 digits vs double. I know that shift just copies values so the question is why FFTW (and cuFFT) are so far off in single precision.

size = 1000

1.779161626470635e-03
1.781748607754707e-03
---
.
1.780879036946570e-03
1.780879036946813e-03

Now numpy a bit more off in single. Double results not identical.

size = 1500

3.309356397994316e-04
3.327933081891388e-04
---
F
3.327185188833359e-04
3.327185189048832e-04

And now numpy is far off in single.