transientskp / pyse

Python Source Extractor
BSD 2-Clause "Simplified" License
11 stars 5 forks source link

Flaw in background calculation #88

Closed HannoSpreeuw closed 2 weeks ago

HannoSpreeuw commented 3 weeks ago

Commit 3a7919a corrects for the image mask not being propagated, i.e. previously unmasked data was being used for the calculation of background characteristics grid. Note that this does not mean that no mask was applied at all, just not enough, in some cases, see below. Also, after interpolation the image mask was reapplied. Nevertheless, some background grid values could be off, since (part of) the pixel values they were based on were masked, which was not accounted for.

(The error had also been noted here)

However the commit gives a new problem, when using a circular "non-mask", e.g. with --radius=1000 in processing this AARTFAAC TBB image: python pyse.py --detection=5 --analysis=5 --grid=128 --bmaj=0.208 --bmin=0.136 --bpa=15.619 --radius=1000 FITS_files/image_206-215-t0002-image.fits This yields:

../../sourcefinder/image.py:320: RuntimeWarning: invalid value encountered in scalar divide
  if np.fabs(mean - median) / sigma >= 0.3:

What is happening is the following: We have useful_chunk = ndimage.find_objects(numpy.where(self.data.mask, 0, 1)) and useful_data = da.from_array(self.data[useful_chunk[0]], chunks=(self.back_size_x, y_dim)), which means that completely masked subimages (of size back_size_x * back_size_y) in the four corners of the rectangular slice encompassing the disc with a radius of 1000 pixels can be completely masked. Consequently, there is no data to process for stats.sigma_clip. Apparently, this returns a standard deviation=0.

To clarify: the error above did not occur prior to commit 3a7919a but that was also wrong, since masks in the four corners were not effectuated, e.g. in the case of --radius=1000, i.e. in the case where pixels outside a radius of 1000 pixels from the image center are masked.

HannoSpreeuw commented 3 weeks ago

If any background grid values would be missing, for grid nodes centered on subimages with masked pixels, that would make the next step, i.e. image.ImageData._interpolate more complex.

A way out would be to replace masked background grid nodes with nearest neighbour values, using something like scipy.spatial.KDtree.

HannoSpreeuw commented 3 weeks ago

This should also fix #67

HannoSpreeuw commented 3 weeks ago

After debugging it turns out that the RuntimeWarning is shown not because of completely masked subimages, but because of almost completely masked subimages, i.e. for at least one subimage only one element of chunk, which is the MaskedArray containing the subimage values, is unmasked.

The standard deviation of an array with only one element is zero.

HannoSpreeuw commented 3 weeks ago

It can be fixed by modifying line 299: if not chunk.any(): into if np.ma.is_masked(chunk) or not chunk.any():

In this way if a subimage has any masked value, it is not used. That makes sense since, if any pixel of a subimage is masked, one can no longer assume that the pixels used for $\kappa \times \sigma$-clipping are centered on a grid node.

HannoSpreeuw commented 3 weeks ago

Which brings me to another problem: grid.mask in scipy.interpolate.interp1d is not propagated in line 405: primary_interpolation = interp1d(y_initial, grid, kind='slinear', assume_sorted=True,

since in scipy.interpolate.interp1d there is a copy to a regular Numpy ndarray: y = array(y, copy=self.copy)

which means that instead of skipping masked grid nodes from the interpolation, zeroes are used - since all zeroes are masked, i.e. zeroes for both mode and rms were set at grid nodes as a way to mask such nodes at a next step in ImageData.__grids, at lines 279 and 284.