transientskp / pyse

Python Source Extractor
BSD 2-Clause "Simplified" License
11 stars 5 forks source link

Do not insert moments-only source-fits into the database. #72

Closed timstaley closed 1 month ago

timstaley commented 9 years ago

The basic logic here is that if we cannot fit a Gaussian, probably there is something wrong with modelling these pixels as a typical source and we should not accept the fit at all. There's also the issue that different fitting methods provide different errors, corrections etc. and it's nice to have a homogenous dataset, if that only means dropping a handful of (already poor) fits.

The long version is at https://redmine.astron.nl/lofar_issuetracker/issues/6041

timstaley commented 9 years ago

In case redmine goes socks-up: https://gist.github.com/timstaley/c95c2b965e076c84bc76

AntoniaR commented 2 months ago

@HannoSpreeuw is this relevant for the new PySE?

HannoSpreeuw commented 1 month ago

Interesting discussion which I would have wanted to join at the time.

I am puzzled by commit 68a2f49:

        if peak < 0:
            self['peak'].error = float('inf')
            self['flux'].error = float('inf')
            self['semimajor'].error = float('inf')
            self['semiminor'].error = float('inf')
            self['theta'].error = float('inf')
            return self

This has been added to the _error_bars_from_moments function.

In any Stokes I image, one will set a positive detection threshold, i.e. all the pixel values from any island will be positive. Deriving the peak spectral brightness from "moments" works exactly as Jess put it:

The moments method does in fact just take the peak pixel, then correct it a bit for the fact 
that pixellation smears out the true peak value.

This means that the peak spectral brightness from "moments" is always positive.
So how can you have a negative peak spectral brightness from "moments"?

HannoSpreeuw commented 1 month ago

The basic logic here is that if we cannot fit a Gaussian, probably there is something wrong with modelling these pixels as a typical source and we should not accept the fit at all. There's also the issue that different fitting methods provide different errors, corrections etc. and it's nice to have a homogenous dataset, if that only means dropping a handful of (already poor) fits.

Gaussian fitting can derail often, even for a well defined island, unless you are detecting at, say, 100 times the rms noise. This is what I could infer from running a large number of tests. This even applies after extra robustness has been added to the criterion of sufficient width of the island in both dimensions, as per this commit.

Wrt to "moments" I refer to that as "moments analysis" or "moments computation" instead of "moments fitting", since there is no fit procedure in the sense of finding a mimimum in the least-squares residuals. Moments computation is only inferior to Gaussian fitting at very high signal to noise (> 100), see paragraph 2.6 of my thesis, so there is no reason to discard moments. The calculated error bars should also be accurate. Moments computation is also more robust than Gaussian fitting, although we have recently stabilized the latter through the introduction of bounds as per this recent commit and a handful of subsequent commits; adding bounds to Gaussian fits enhances stability to the point that it can never derail, if it is applied to at least five Gaussian parameters (the p.a. can be excluded, I reckon).

But, to be fair, to derive anything sensible with moments computations, one needs the same width criterion as for Gaussian fitting. Perhaps we should apply something simpler than "moments" when only e.g. three pixel values of the island are above the analysis threshold. In that case, just take the peak pixel value (with the fudge_max_pix correction) for the peak spectral brightness and derive the position using the barycenter method, which can never derail. The Gaussian shape parameters could be copied from the clean beam, i.e. they would not be computed.

From this discussion I can infer two improvements wrt to computing source properties, therefore I will turn this into a PySE issue. One of them would be to remove the if peak < 0: conditional in _error_bars_from_moments, since it should not occur. The second would be to come up with something more sensible for "thin" detections; the possible solution I just described.

I am eager to reproduce the problems reported, so please provide a way to do this.

HannoSpreeuw commented 1 month ago

I am puzzled by commit 68a2f49:

I will revert the part of that commit that affects sourcefinder/extract.py, since any negative peak spectral brightness from a moment analysis is impossible. It cannot occur when detectionthresholdmap is "sane", i.e. all its values are positive.

The only circumstance I can think of.....some values in the noise map can be negative, i.e. the map from the interpolated rms node values (always positive) can be negative if a higher order interpolation scheme is used. The detection threshold (=positive float) times the rms noise map can in that case be negative for some part of detectionthresholdmap. But this cannot happen if INTERPOLATE_ORDER = 1. That had already been set as a default value prior to this issue, I guess starting from commit dabc8a0, from April 15, 2014.

What the setting of infinite error bars should probably be replaced with is raising a ValueError:

if peak < 0:
    raise ValueError((f"Peak from moments = {peak:.2e}, something is possibly "
                       "wrong with detectionthresholdmap, since a negative "
                       "peak from moments analysis should not occur."))
HannoSpreeuw commented 1 month ago

Fixed by commits 4b94f16, e81bbe5 and 9257eee.

But merge with master pending.