oslocyclotronlab / ompy

A python implementation of the Oslo method
https://ompy.readthedocs.io
GNU General Public License v3.0
6 stars 7 forks source link

Remove or Improve background subtraction: Currently introducing a bias? #119

Open fzeiser opened 4 years ago

fzeiser commented 4 years ago

This is a suggestion by Anders, as an alternative to the "remove negatives", see #116 & that we currently perform the on the background

Statistics, last paragraph in Section 4: I think we're introducing a slight bias by leaving out the negative-count bins in the background-subtracted spectra. That is, in our simulated spectra we accept statistical fluctuations in one direction (surisingly low background count and/or surprisingly large total count), but we exclude fluctuations in the opposite direction (high background count and/or low total count).

Would anything break in the math/code if we actually just included the negative-count bins in the fit? To be clear, I don't expect the impact to be large (perhaps not even noticable), so if it's technically challenging we may want to leave it as is.

Alternatively, I guess we could sample the total count (tot_i) first, and then sample the background count (bkg_i) repeatedly until we get a sample that satisifies bkg_i < tot_i -- so effectively sample the background count from a conditional distribution p(bkg_i | lambda_bkg, bkg_i < tot_i). [I think we're encountering a classic statistics issue here: if the true value of some quantity X is close to zero, X < 0 is unphysical, and your individual estimates of X have a significant statistical uncertainty, you should expect some of your X estimates to get a central value in the X < 0 region. If you force each individual estimate to be X >= 0 (e.g. by leaving out the X < 0 estimates) and later combine your X estimates, your combined estimator will be biased towards high X values.]

fzeiser commented 4 years ago

Somewhat along the same lines is then #28 and following comment

7) Question about the chi^2 in Section 5: We say that "[...] most bins of the first-generation matrices follow a normal distribution". I assume it's the low-count bins that deviate most strongly from a normal distribution? I wonder if this might improve a bit if we include the negative-count bins in the fit (point 5 above)? [For the future: it could be interesting to try to replace the chi^2 with a log-liklihood function that also tries to account for the deviations from normal distributions.]

fzeiser commented 4 years ago

In line with the comments by the referee we might just as well not (by default) cut away the negative counts etc. I'm not working on a branch to implement this.

If one still wishes to run a bg subtraction in the Ensemble class, one could for example use the action_raw, action_unfolded and action_firstgen attribute to apply it to the corresponding matrices.

fzeiser commented 4 years ago

See also https://github.com/oslocyclotronlab/ompy/pull/148#issuecomment-689588710 on another idea of how to avoid the bias.