ratt-ru / bullseye

A GPU-based facet imager
GNU General Public License v2.0
1 stars 1 forks source link

On the topic of High Dynamic Range imaging #49

Closed bennahugo closed 9 years ago

bennahugo commented 9 years ago

@o-smirnov Seems single precision only gives us accuracy up to 2 decimal places (you mentioned hdr imaging requires a lot more than this). When I switch the CPU code to double precision (via helpers/base_types.py and algorithms/gridding_parameters.h) then things are a lot more accurate (the center pixel for the unity case is equal to 1 when I open the fits file up in ds9). I will write some wrappers for gpu atomic add (by default not available) and check how much rounding error the gpu accumulates with its massive out of order accumulation.

o-smirnov commented 9 years ago

@cyriltasse don't miss this discussion!

cyriltasse commented 9 years ago

Yep, strangely, I was discussing these issues with @o-smirnov et al. last week.

For awimager I noticed we were able to reach 10^4 DR only. Others then realised there was a non-reproducability at this level (each run was giving different results at this level), but only for multithreaded. With one thread, the result was wrong at the same level, but stable. We concluded it was due to the order of rounding and to the precision and to the single float. You'll see the same, it normal...

My personnal conclusing is that single precision might be fine in many regimes (even aiming at 10^6 level), as we can substract the 10-100 brightest sources in double prec responsible for the first 10^2 DR, and deconvolve the rest of the 10^4. But an external switch can be more than useful.

bennahugo commented 9 years ago

Yup Cyril... this indeeds proves that floating point arithmetic is not associative... worth thinking about it seems - I will try and add an external switch to bullseye... just means we have to compile a few more libraries... luckily that's not a big deal

bennahugo commented 9 years ago

O.o the big precision toggle switch is now available as command line argument :-) Good bedtime reading: http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html

cyriltasse commented 9 years ago

There is an error in proof of Theorem 12. :)

bennahugo commented 9 years ago

For future reference: gridding in double precision mode on the GPU is more than 3x slower for my 1.6 GiB dataset

bmerry commented 9 years ago

I'm surprised it is only 3x. What GPU are you using?

bennahugo commented 9 years ago

This was on my GT770

bmerry commented 9 years ago

3x slower sounds pretty good then, considering the DP hardware is 24x slower than the SP hardware. Presumably this means you aren't flop-limited in the single precision case.

o-smirnov commented 9 years ago

It's not flop-limited when doing once facet, as I understand. @bennahugo , did you do this test for multiple facets?

bennahugo commented 9 years ago

I suspect it may be because occupancy isn't very good to start off with (its actually limited by register usage even in the single correlation case of the float32 implementation), so we're not using all the single precision units in any case. @o-smirnov this is for the single facet case... the GPU may do better if I schedule more work - there is always that possibility. I will do some detailed profiling once I've completed Cyril's use case