sot / xija

Thermal modeling framework for Chandra X-ray Observatory
https://sot.github.io/xija
BSD 3-Clause "New" or "Revised" License
9 stars 5 forks source link

Avoid crashes in plotting, do not plot residuals of mask or bad times #127

Closed jzuhone closed 2 years ago

jzuhone commented 2 years ago

Description

For models which execute very quickly, it is sometimes the case that during fitting a race condition of sorts is encountered when a plot that is being updated and a reference to the canvas object associated with the plot is lost (see Issue #126).

Unfortunately, the cause for this is still unknown. I was able to fix this problem by 1) Removing unnecessary references to the canvas object I was keeping around and also calling canvas.flush_events after every update.

This PR also adds some logic that affects the resid__data and histogram/dashboard plots in xija_gui_fit by masking out data that is under a "bad time" or a "mask time". The residuals are set to zero in such cases, but because of that they still appear in the plot and can give default plot limits that are unhelpful. We copy the residual data before plotting and set the masked data to NaN here.

Fixes #126

Testing

Functional tests

Fitted the 1DPAMZT model over several iterations, fit the WIP HRC model which was crashing previously, crashes no longer occur.

jzuhone commented 2 years ago

It would probably be better to handle this with some kind of timer, but the examples I looked at online were rather complex and I wasn't sure they would work here. Open to suggestions.

taldcroft commented 2 years ago

I rebased #125 on this branch and was able to successfully fit the HRC CEA model for both 100 days and 16 days with no crash.

I agree this fix feels a bit ad hoc and possibly prone to failures... but I don't have any better ideas and it does seem to work. So I'm approving but it would be good to get confirmation from Dan.

taldcroft commented 2 years ago

Scratch that. Using this branch rebased on #125 I got the error for the HRC model (100 days) by deleting the DPA plot and then trying to add the 2CEAHVPT resid vs time plot.

jzuhone commented 2 years ago

@taldcroft I think I found the real fix. You'll have to delete the original version of this branch to try it because I rewrote the history. Look at the diffs and you can see that this is a much better solution.

taldcroft commented 2 years ago

I won't say that I fully understand why that works (why that ref was causing crashes), but it certainly looks a lot better. Nice!

I have been using it for 5 minutes now with no crashes.

jzuhone commented 2 years ago

I think it's flush_events that's doing the heavy lifting here, but I also thought it was unnecessary and possibly error-prone to keep the extra canvas attributes around.

drxre commented 2 years ago

I was able to install and run it successfully. Thank you for all the hard work!