spacetelescope / jwst

Python library for science observations from the James Webb Space Telescope
https://jwst-pipeline.readthedocs.io/en/latest/
Other
556 stars 164 forks source link

Refactor the Bayes Evidence Fringe fit #6352

Open stscijgbot-jp opened 2 years ago

stscijgbot-jp commented 2 years ago

Issue JP-2286 was created on JIRA by William Jamieson:

Currently, the residual fringe correction step relies on the external dependency BayesicFitting. This library is a mathematical modeling and data fitting library intended for astronomy applications based on the Herschel Common Science System. This library is largely incompatible with astropy and its modeling package, which is the common mathematical modeling and fitting package used throughout the rest of the JWST pipeline. Moreover, BayesicFitting is not as well supported or maintained as astropy. Thus it is undesirable to include another external dependency on a less well maintained library, when much of that functionality can be derived directly from astropy.

The second major step of the residual fringe correction involves fitting a "Sine" model to the significant fringes using "Bayesian evidence". Aside from the identification of significant fringes, the rest of this step relies entirely on models and fitters included in the BayesicFitting package. Thus this step of the residual fringe correction should be refactored to rely on astropy derived or compatible models and fitters instead.

stscijgbot-jp commented 7 months ago

Comment by William Jamieson on JIRA:

I have a few comments and concerns surrounding the fit_1d_fringes_bayes_evidence method (which implements the fitting of a "Sine" model to the residual fringes using "Bayesian evidence"):

The "Sine" model being fit is not really a model involving purely sine terms, it is instead a finite Fourier series. To see this, one can look at how the model for the fit is created, which uses the multi_sine method. The multi_sine creates the model by adding many of BayesicFitting SineModels together each initialized with its default values. If one examines this model one finds that each of these models have the form $a \cos(2\pi\omega x) + b \sin(2\pi\omega x)$ ($a$ and $b$ are amplitudes while $\omega$ is the frequency), which are clearly terms of a Fourier series. Indeed the frequencies are then assigned and fixed here. Thus I think we should discuss this fit in terms of fitting a finite Fourier series to the residual fringes rather than a "Sine" model.

I believe I have found a major bug in how the "Bayesian evidence" calculation used by this step. The evidence calculation is performed for each fit iteration here, note that the calculation is wrapped in a try-except block. If one traces this call through the BayesicFitting package, one finds that it relies on the getLogZ method from the fitter. In getLogZ the log of the determinant of the Hessian matrix for the model being fitted is calculated, see here. This is a major issue because the Hessian (note that the Hessian in this case would be calculated in terms of the parameters of the model being fit) will always have determinant zero (at least if you work it out analytically). Indeed if you take just the only amplitudes being fit (in which case the Hessian is all zeros) or take into account the frequency parameters too, the Hessian calculation must be zero analytically. What we get in the code execution (in the tests that I have run) is that det(Hessian) is an extremely small value, around 10e-24 or smaller, in any case it certainly looks like it is attempting to approximate a zero value (which it should). Normally, one would think that this would then cause the math.log method to throw an exception, which it does eventually; however, in many cases the first few iterations are not small enough (or the right values) for math.log to throw an exception so the iteration may continue a few times even though it really should have failed in the first step. In any case, the getEvidence method clearly is invalid for the model we are attempting to fit, which throws the whole "Bayesian evidence" fit into question. Note that since getEvidence is wrapped in this try-except block, the error is captured and the evidence is set instead to a very negative number -1e9, which forces the iteration loop to break and return the last "good" iteration value. Since this error should always (but isn't for numerical reasons) be thrown, the result should always be the initial guess prior to this iterative Bayesian process; however, this does not always occur meaning the results of this process should not be trusted.

stscijgbot-jp commented 7 months ago

Comment by Jane Morrison on JIRA:

Patrick Kavanagh adding Patrick to this thread. He is the MIRI team developer of the residual_fringe correction and may have some insight to the details of the step code

 

stscijgbot-jp commented 7 months ago

Comment by Patrick Kavanagh on JIRA:

Hi William Jamieson Jane Morrison

  1. This is a fair point. I think the model is named 'SineModel' in BayesicFitting because there is a switch to use the same model as in astropy Sine1D. The mode of the BayesicFitting SineModel used in the residual fringe code is an identity without the phase term to make the fitting easier.

  2. This is concerning. This Bayesian evidence part of the code is supposed to determine the optimum number of fringes to include in the final model and it now appears we cannot trust that number. Thanks for looking so deeply into the code and uncovering this issue. When I think back on development/testing there was never any reason to suspect that there was a deeply rooted issue with the Bayesian evidence estimates. The reason for including the try-except blocks was that the correction code needed to be able to handle the low signal/datapoints data that were thrown at it without crashing. For column data at the slice edges, noisy data, and/or a reduced number of datapoints (e.g. from feature flagging or off-slice pixels), BayesicFitting would sometimes raise an error, which, according to the user guide, indicated the fitted model was too complex for the data and a simpler model should be used. It looks like the try-except block was also catching the 'math.log' error and hiding the issue.

William Jamieson Do you have any initial thoughts on the refactoring of the Bayesian evidence part of the code? Will it be possible to determine the Bayesian evidence for this type of model?

 

stscijgbot-jp commented 7 months ago

Comment by William Jamieson on JIRA:

Hi Patrick Kavanagh,

My initial thoughts are that we are attempting to fit a finite Fourier series to the data, so it is likely that some cleaver techniques may be able solve that problem. I will have to do some research to figure out if anything might work on that front.

As far as the "Bayesian evidence" calculation, I am not even sure how "Bayesian" that calculation is in this instance. Incorporation of the model's priors happens here; however, notice that if the limits input is set, then the model priors are then not used. The limits input is indeed set during the evidence calculation. Thus I am not exactly sure what the getEvidence method is actually calculating. So I think I'll need to look at a reference to see if I can rebuild this calculation to do what it is intending to do in our context. If you happen to have a reference on hand please point me at it.

In the short term, I think the immediate way to side step this issue is to skip the evidence loop and instead set opt_nfringes to be the the length of freqs. This would force the fit to use all of the frequencies that it has singled out.

I am also happy to setup a meeting with you and/or Jane Morrison (and anyone else you think should be involved) to discuss this issue.

stscijgbot-jp commented 7 months ago

Comment by William Jamieson on JIRA:

Also, in light of the fact that the model being fit is a finite Fourier series with fixed frequencies, the Levenberg-Marquardt least squares algorithm for the fit is over kill. This is because the model is linear in the fit parameters (the sine and cosine amplitudes only), and so a linear least squares fitter will work better. Linear least squares is less sensitive to initial guesses and is generally faster. Thus I suggest once we refactor to using pure astropy (which is mostly done), we can use the LinearLSQFitter instead of the LevMarLSQFitter.

Note that aside from producing some working replacement for getEvidence I have a working version of fit_1d_fringes_bayes_evidence which depends solely on astropy models.

stscijgbot-jp commented 7 months ago

Comment by Patrick Kavanagh on JIRA:

_'Note that aside from producing some working replacement for getEvidence I have a working version of fit_1d_fringes_bayes_evidence which depends solely on astropy models.'_

That's great William Jamieson (and fast!).

 

'So I think I'll need to look at a reference to see if I can rebuild this calculation to do what it is intending to do in our context. If you happen to have a reference on hand please point me at it.'

I'm afraid I don't have a good reference to point you to other than those provided in the BayesicFitting design document, which might be of help:

https://github.com/dokester/BayesicFitting/blob/master/docs/design.md

 

'I am also happy to setup a meeting with you and/or Jane Morrison (and anyone else you think should be involved) to discuss this issue.'

I'm happy to talk about this over a telecon. I will try to contact the developer of BayesicFitting (Do Kester) in the meantime to see if he would be interested. He is retired but I think he still works part-time in the MIRI team at Groeningen.

 

 

 

stscijgbot-jp commented 7 months ago

Comment by William Jamieson on JIRA:

I think I have found the reference detailing the theory behind the getEvidence calculation [^MacKay_1992.pdf], it what is referenced by D. Kester in [^Kester_2003.pdf] for the details of the how the "Occam factor" is calculated (hopefully I have followed the right trail of references). The Occam factor is what the evidence value is derived from, and it is what requires the Hessian calculation.

It looks like the reason the Hessian is used at all is that they are using a Taylor expansion (see top of page 420), this might be inappropriate as the model in question itself is linear in the parameters under consideration (the amplitudes of the sine and cosine terms are the only parameters being fit). Thus it maybe the case that we can use the linearity of the problem to our advantage in calculating this Occam factor. I will need to digest the details of the problem before I can say or do anything definitive.

As for a telecon, please let me know your availability as it would be good to discuss the residual fringe correction in general with someone from the MIRI team (there will be some need in the future as the astropy based fits will likely not exactly replicate the results of the current code). Also, it would be great to talk with Do Kester, I am not super familiar with the intricacies of this calculation (my goal is to create a functioning astropy based version of the residual fringe), so I maybe incorrectly reasoning through some of the calculations. Moreover, I am hoping to interest Kester (or other BayesicFitting developers) in migrating the BayesicFitting code into astropy.modeling (or as an astropy affiliated package) itself as opposed to a standalone astronomy focused modeling library (I am one of the maintainers of astropy.modeling so I can assist in the process, though I do not think I have the time to migrate anything other than the functionality required to handle the residual fringe at this time).

stscijgbot-jp commented 7 months ago

Comment by Patrick Kavanagh on JIRA:

William Jamieson Jane Morrison Do Kester has replied and said he would be interested in joining the telecon. He is available most afternoons (European time). I shared with him some of the comments in this thread (he doesn't have access) to get his thoughts. His replies are below. We can discuss in more detail in the telecon.

On BayesicFitting in astropy:

I would certainly be interested in affiliating BayesicFitting to Astropy. I am not really sure what it entails, but I guess it means that BF learns more about Astropy such that they work seamlessly together. I already use parts of Astropy, i.e. the Units package. As I am 71 years old and retired I have some time at hand to work on this.

 

On getEvidence issue:

I am quite sure there is no problem in the evidence calculation of BayesicFitting. I have calculated the evidence zillions of times and inspected the results for a fair fraction of them, just to see whether the calculations made sense. I never found a problem.

As Laplace once said in the 19th century: “Probability is nothing more than common sense reduced to numbers.” So when the evidence tells you something is present, you should be able to see it. And reverse: if you see some feature is clearly present, the evidence (on a proper model) should tell you so.

In this particular case the complaint hinges on the Hessian. For a simple 1 fringe model with known frequency the determinant of the Hessian is

det = SUM( c^2 ) SUM( s^2 ) - ( SUM( c s ) )^2

where c = cos(fx) and s = sin(fx) and x the wavenumbers and f is the fringe frequency. c^2 and s^2 are all positive numbers between 0 and 1. The SUM adds them all up. The second term, c s, equals 0.5 sin(2fx). It has values between -0.5 to 0.5. These terms more or less cancel each other in the summing (unless the x’s are very specially located). So the second term is much less than the first. Conclusion: the determinant does not go to zero.

For more fringes, the pattern is similar. The Hessian has sums over squared sines and cosines on the diagonal and mixed entities elsewhere. The diagonal is all positive and the rest is positive and negative.

The fact that the det is sometimes very small does not worry me much. Probabilities are always calculated in the log because the true numbers get so large (and small) that they cannot be handled by present day computers.

The method proposed for fringe removal looks very much like Fourier series. Colloquially I called the method Slow Fourier Transforms (SFT), which compares to FFT, as slow food does to fast food. Fast food fills your stomach, but slow food does it better.

 

stscijgbot-jp commented 7 months ago

Comment by Patrick Kavanagh on JIRA:

Do also had a look through some of the residual fringe code. He had some comments on some things that could be changed/improved to be more rigorous. I will put them here for now. I think it would be good to try to implement some of these. William Jamieson Jane Morrison What would be the best way to go about this? For example, I could wait until William Jamieson has finished the BayesicFitting replacement work and then I could branch and make the changes. Does that sound ok or does any changes to the step need to be implemented at STScI? Again, we could discuss this in more detail in the telecon.

Comments from Do Kester:

While thinking about this I took the liberty to look at the code of the residual_finge/utils.py file on github. I have some remarks that you might want to consider.

  1. When you fit the fringes, you assume that the frequencies are known. From the LS-periodigram. However that is just fitting with another name, with another "fitter”. You did not know them beforehand, e.g. from the thickness of the reflecting material which would dictate that all fringes have certain frequencies. Our understanding of the instrument is not that far yet, especially as it handles here about RESIDUAL fringes. So also the frequencies are fitted and scientific honesty requires that you calculate the evidence over the frequencies too. I would go even a step further and do a run of the LevenbergMarquardtFitter on frequencies and amplitudes just to make sure that the frequencies you got from the LS-periodigram are fine-tuned to the optimal amplitudes. Use the parameter values from the LS-periodigram as starting values in the LMFitter. Especially when fringe frequencies are close, it does make a difference in my experience.

Is what you are doing now wrong, in the sense that it yields erroneous results? Probably not. If you are doing the calculations in a consistent way, the evidence will still tell you whether a fringe is present or not. But you have to check.

  1. Did you check that the LS-periodigram returns peaks for all fringes, even the small ones, maybe hidden in the wings of a strong one. In my original implementation of the residual fringe calculator for ISO-SWS in the 80’s, I had a scheme where I subtracted the fringes (temporarily) from the data and looked for a new peak. The frequency of the new peak was added to the existing model (as an extra component) and this new model was fitted to the original (unsubtracted) data. This way also peaks in the wings of a strong one can be found.

  2. I noticed that you always fit at least one fringe. However in some sources might be so weak that fringes cannot be found. So some bogus “fringe” is selected and subtracted. To avoid this, you can use the ConstantModel from BayesicFitting. The ConstantModel just returns zero, it has no parameters. Obviously it cannot be fitted. It states that all signal in the data is noise. You can calculate the evidence for this model, which can subsequently be compared to the evidence for 1 fringe. Etc. In fitter.getEvidence() remove the limits=[] keyword, but keep the noiseLimits=[].

  3. Did you make sure that the fitted amplitudes (for cos end sin) are always between [-1,1]. The limits=[-1,1] keyword assumes that you have a uniform prior which extends from -1 to +1. The Fitters can return values outside that range, because they are not fully Bayesian. It is your duty to make sure that the fitted results are possible when having a uniform prior from -1 to +1. The limits affect the value returned by getEvidence().

 

 

stscijgbot-jp commented 7 months ago

Comment by Jane Morrison on JIRA:

Patrick Kavanagh I was thinking lets see how it goes with William replacing the BayesicFitting package with astropy routines. Then if you want to make a branch and improve the code that is completely fine. Should we try for a meeting next week then. Say Tuesday - afternoon for Europe ?  Is Do in Brussels  ? Lets say 3 pm Europe. 

stscijgbot-jp commented 7 months ago

Comment by William Jamieson on JIRA:

Anytime on Tuesday will work for me for a meeting.

stscijgbot-jp commented 7 months ago

Comment by William Jamieson on JIRA:

Patrick Kavanagh I do not follow Do Kester's arguments about the Hessian calculation. I think we must be talking about two different things. I have written up some notes (see [^hessian.pdf] on my thoughts on this issue. I really think we need to resolve this mathematical discrepancy because currently the code is putting out garbage (result of an exception due to a mathematical error) for my test cases at the very least. It might be the case that I need to create different test cases; however, my current set of cases are the ones which appear to be very well behaved which make them good candidates for initial refactoring work.

stscijgbot-jp commented 7 months ago

Comment by Patrick Kavanagh on JIRA:

Jane Morrison William Jamieson That time on Tuesday is fine for me. Do is in the Netherlands so the same time zone as Brussels. I'll forward the proposed telecon time and William's notes to Do in the email thread.

stscijgbot-jp commented 7 months ago

Comment by Nadia Dencheva on JIRA:

Putting this on hold for now until we get further direction from the IDT team about changes they want to make to the algorithm.

The spline work in JP-2284 is considered complete and can go in.

stscijgbot-jp commented 5 months ago

Comment by David Law on JIRA:

I think this ticket is now effectively a duplicate of https://jira.stsci.edu/browse/JP-2205, in that both talk about a long-term goal to explore moving away from the BayesicFitting package.  I'm not aware of any work going into this from the instrument team side any longer, as empirically the code is in the pipeline and working well.