transientskp / pyse

Python Source Extractor
BSD 2-Clause "Simplified" License
11 stars 5 forks source link

Compute background grid as fast as sep, but without sep #89

Open HannoSpreeuw opened 3 weeks ago

HannoSpreeuw commented 3 weeks ago

Currently, the fast way in PySE to compute the background grid (mean and standard deviation of the background pixels on a regular grid with its nodes centered on subimages with size back_size_x * back_size_y) is to use sep.

It is considerably faster than PySE's default way to compute the background grid, which applies parallellisation using Dask over rows of subimages fed into stats.sigma_clip. The default approach is more accurate though, since inside stats.sigma_clip a more intelligent algorithm than sep's i.e. than SExtractor's, is deployed.

The code for computing the two background characteristics of a single subimage is stats.sigma_clip. With some adjustments, stats.sigma_clip can be deployed with Numba's guvectorize decorator.

Using a reshape of the image data as an input argument to the decorated and slightly modified version of stats.sigma_clip would then be sufficient to compute the background characteristics on all background grid nodes in a vectorized way.

Tests reveal that this takes about the same time as deploying sep, while arriving at the same numbers as PySE's default approach - through Dask's map_blocks and stats.sigma_clip - i.e. while retaining the same accuracy, but these tests exclude the time taken by image.ImageData._interpolate. If the latter time is negligible, applying Numba's guvectorize decorator would offer a great improvement.

Both Dask and sep could be removed as dependencies.