Double or single precison

cyriltasse commented 9 years ago

Again, rising this issue. Is casa gridder/degridder in single or double precision?

bennahugo commented 9 years ago

Hey @cyriltasse

https://github.com/pkgw/gcode-casacore/blob/trunk/scimath_f/convolvegridder.f

Some of it like the convolution function seems to be. The grid isn't double precision. The visibilities and uvw coordinates should be single precision when read from MS.

bennahugo commented 9 years ago

I think we should simulate a few MS of varying size with system noise and maybe one or two fainter sources and check the divergence when gridding larger and larger numbers of visibilities. Thoughts on this @cyriltasse ? There is an option in bullseye that will switch all the datatypes to either single or double precision.

bmerry commented 9 years ago

I'd also be interested in trying out the spectral imager I'm developing on such simulations. It also has a command-line option to use single or double for certain portions of the pipeline (although they might not be the right portions - something worth finding out).

bennahugo commented 9 years ago

@bmerry Yup I was thinking of putting this in my thesis anyway. Will be in tomorrow. Maybe @o-smirnov can help out with simulation parameters for Meqtrees and simms with MeerKAT/EVLA layout?

bennahugo commented 9 years ago

@jfunction Something to keep in mind for your gpu facet degridder.

bennahugo commented 9 years ago

I have seen some difference with a simple unity test (double precision is much closer to bringing out the 1Jy source in the centre of the image whereas single precision gives something like 0.98-0.99 for a 1.6 GiB MS with weights not all set equal to 1, though I expect if we use more realistic data the difference may be more noticeable since the noise will add more variation in magnitude between the numbers, as well as the fact that you accumulate positive and negative numbers of the same magnitude together initially.

bennahugo commented 9 years ago

@ddmusc Was wondering if you had any insights to this?

o-smirnov commented 9 years ago

Interesting question. CLEAN itself is limited to ~10,000 dynamic range at most due to pixellation effects etc. Higher DR can (at present) only be achieved by subtracting the brightest sources in the visibilities.

I suppose we should do quick simulation of a realistic field (take e.g. the 3C147 model), and see if we can still get to 10,000 DR with single precision.

cyriltasse commented 9 years ago

I think we should identify where there is a many-element sum, since this is where rounding error matter and create a DR limiting noise. I think it's not about maximum and minimum values, but benna understand those aspects better.

As a first guess I think such many-element sum appear in (i) FFT/iFFT, and (ii) gridding. It does not appear in degridding itself (regardless of interpolation errors). Since gridding accurancy is not so important, I'd think CF can be in single precision, but should have large oversampling, and grid in double precision only for FFT/iFFT, then casted to single for gridding/degridding.

ddmusc commented 9 years ago

Unfortunetly I can only comment on lwimager...CASA have changed code lately to use multi-threading. In lwimager small datasets are gridded using double precision while large Datasets are gridded using double precision.

@bennahugo I am a bit surprised that you are getting such "high" errors in the PSF. Maybe you should share with me your MS to try it out on the latest mt-imager (not released) as to see if I get similair high errors

bennahugo commented 9 years ago

@ddmusc I will generate some images tomorrow and put a dataset up somewhere accessible. Thanks! @cyriltasse Agreed those should be the two places where we have a problem of accuracy, specifically the normalization term and the accumulation to the grid points in the gridder. FFT probably depends on the size of the image. Will try and do some analysis tomorrow. @o-smirnov Can I grab that model from you tomorrow?

cyriltasse commented 9 years ago

@bennahugo How do you look at the normalization of you want to optimise accurancy, say in cingle precision? Do you prenormalise by some other factor before the FFT? If so what is the best value of that number?

bennahugo commented 9 years ago

@cyriltasse The normalization term must be in double precision I found. I pre-normalize before taking the FFT - it doesn't really matter if you do it before or after. Alternatively we can always calculate the psf and normalize by the centre pixel as Bill Cotton points out in his memo, but then we double the gridding time.

bennahugo commented 9 years ago

@o-smirnov I've created a pipeline to simulate 3C147 and add noise (although I can see squat because of the beams created by the bright sources in the centre of that field...). Just to confirm: I compute dynamic range as max_brightness / sigma_noise?

cyriltasse commented 9 years ago

@bennahugo When comparing dirty images to adress the current topic, two things should matter (apart from the CF themselves): number of pixels in the image (for iFFT accuracy), and number of visibility (that stack onto each other during the gridding). So I have one general theoritical question to which you may have the answer. If we compute a sum of N numbers (and if we do that in serial to simplify), the rounding is done for each 2 number sum, so (N-1) times. What is the final error on the sum, and how does that evolve with N?

bmerry commented 9 years ago

@cyriltasse accuracy of a sum depends not just on N, but also the distribution of the numbers and the order in which they're added. For example, an FFT ends up computing the DC term by repeatedly pairing up the numbers and computing the number of each pair, then repeating on the remaining N/2 numbers. If the numbers have similar magnitudes, this is more robust than a sequential accumulation. Adding a mix of positive and negative values can also cause a much bigger (relative) error than just adding positive values. I suspect that partly contributes to @bennahugo's problems with computing the normalization factor, because his anti-aliasing filter has negative lobes.

For sequential addition, the error tends to grow with sqrt(N); with pairwise addition, it grows with log(N). You might find this page of interest: https://en.wikipedia.org/wiki/Kahan_summation_algorithm.

bennahugo commented 9 years ago

So if I get this right: since we have a mix of negatives (also considering the data is noise-like) in here this naive addition error should should grow as n_vis_at_uv * epsilon where epsilon = (B / 2) * B ^ ( -p ) and B = 2 is the base of floating point addition / subtraction and p is the precision specified in IEEE 754. p = 24 bits for single precision and p = 53 bits for double precision. The error will be most apparent near the grid centre at the short baselines, but if we consider very good uv coverage for a moment and this rounding error to be additional noise then we will exceed the dynamic range requirements within a few 100 visibilities per grid point? This is not too far fetched since the filter will overlap a lot near the centre of the grid.

bennahugo commented 9 years ago

It is probably safer to do the normalization by computing the psf then as well.

bmerry commented 9 years ago

So if I get this right: since we have a mix of negatives (also considering the data is noise-like) in here this naive addition error should should grow as n_vis_at_uv * epsilon

Which sum are you considering here?

bennahugo commented 9 years ago

the additions onto the grid

cyriltasse commented 9 years ago

I cannot really participate to the discussion at that level, but I enjoy reading it :)

bmerry commented 9 years ago

On 9 July 2015 at 16:48, bennahugo notifications@github.com wrote:

the additions onto the grid

— Reply to this email directly or view it on GitHub https://github.com/ratt-ru/bullseye/issues/65#issuecomment-120009307.

Visibilities can't be completely noise-like, otherwise you'd get no information from them at all. The expected value at each grid point should be the "ideal" visibility predicted by the measurement equation, which I think means that the condition number for a grid cell sum should converge to a constant rather than growing at O(sqrt(N)).

However, as Cyril has pointed out to me, the accuracy of the gridding doesn't matter too much, because any errors can be corrected in the next major cycle. It's the accuracy of degridding that really matters.

Bruce

Bruce Merry Senior Science Processing Developer SKA South Africa

bennahugo commented 9 years ago

Thanks @bmerry your insight helped a lot. Will tinker around this a bit more

ratt-ru / bullseye

Double or single precison #65

Bruce