Closed bennahugo closed 9 years ago
@cyriltasse don't miss this discussion!
Yep, strangely, I was discussing these issues with @o-smirnov et al. last week.
For awimager I noticed we were able to reach 10^4 DR only. Others then realised there was a non-reproducability at this level (each run was giving different results at this level), but only for multithreaded. With one thread, the result was wrong at the same level, but stable. We concluded it was due to the order of rounding and to the precision and to the single float. You'll see the same, it normal...
My personnal conclusing is that single precision might be fine in many regimes (even aiming at 10^6 level), as we can substract the 10-100 brightest sources in double prec responsible for the first 10^2 DR, and deconvolve the rest of the 10^4. But an external switch can be more than useful.
Yup Cyril... this indeeds proves that floating point arithmetic is not associative... worth thinking about it seems - I will try and add an external switch to bullseye... just means we have to compile a few more libraries... luckily that's not a big deal
O.o the big precision toggle switch is now available as command line argument :-) Good bedtime reading: http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html
There is an error in proof of Theorem 12. :)
For future reference: gridding in double precision mode on the GPU is more than 3x slower for my 1.6 GiB dataset
I'm surprised it is only 3x. What GPU are you using?
This was on my GT770
3x slower sounds pretty good then, considering the DP hardware is 24x slower than the SP hardware. Presumably this means you aren't flop-limited in the single precision case.
It's not flop-limited when doing once facet, as I understand. @bennahugo , did you do this test for multiple facets?
I suspect it may be because occupancy isn't very good to start off with (its actually limited by register usage even in the single correlation case of the float32 implementation), so we're not using all the single precision units in any case. @o-smirnov this is for the single facet case... the GPU may do better if I schedule more work - there is always that possibility. I will do some detailed profiling once I've completed Cyril's use case
@o-smirnov Seems single precision only gives us accuracy up to 2 decimal places (you mentioned hdr imaging requires a lot more than this). When I switch the CPU code to double precision (via helpers/base_types.py and algorithms/gridding_parameters.h) then things are a lot more accurate (the center pixel for the unity case is equal to 1 when I open the fits file up in ds9). I will write some wrappers for gpu atomic add (by default not available) and check how much rounding error the gpu accumulates with its massive out of order accumulation.