sot / xija

Thermal modeling framework for Chandra X-ray Observatory
https://sot.github.io/xija
BSD 3-Clause "New" or "Revised" License
9 stars 5 forks source link

[bugfix] Always start model evolution on even-numbered engineering telemetry indices #73

Closed jzuhone closed 4 years ago

jzuhone commented 4 years ago

Semi-frequently, at load review comparisons between SOT and FOT thermal models can be discrepant and cause SOT to report a limit violation when FOT does not see one.

One (but not the only) source of this discrepancy is the previously noted behavior that if one executes the same thermal model beginning at two different simulated times, the predictions can be off by ~tenths of degrees C, especially at temperature extrema, and such discrepancies can persist to late evolution times. An example is shown here--the same model run twice, from the two start times of 2019:100:21:15:25.2 and 2019:100:20:05:15.2 (approximately 1h10m apart):

two_models

while the evolution is very much the same, taking the difference between the two runs for which the same times are available shows the discrepancies:

two_models_bad_diff

However, changing the second model's start time by 5 minutes eliminates the long-term discrepancies, leaving only an error from the initial condition:

two_models_good_diff

Two effects contribute to this behavior. The first is the way we do the evolution of the ODE in xija: we perform a midpoint method / RK2 integration, using the j and j+2 time steps as the beginning and end points of an RK2 iteration, and using the j+1 step as the "midpoint." This results in an odd-even difference between steps in how their evolution is computed. These indices matter because we do not begin xija model evolution at arbitrary times but instead at times which correspond to engineering telemetry.

The second effect is that in the thermal model evolution there are many effects contributing to the derivative dT/dt at different times, and the resulting ODE can become stiff. This occassionally results in oscillatory behavior of the derivative dT/dt, as shown here in a zoomed-in region of the above model:

oscillations

However, if the model evolution is begun at a slightly different time from another, and the two models begin at indices which are odd and even, the oscillatory behavior will not line up but instead be at off-by-one steps:

oscillations2

The solution this PR proposes is to always begin model evolution at an even index in the space of engineering telemetry timesteps.

This PR also adds a comment to the RK2 integrator, noting an issue which I discovered where the algorithm implemented there is not strictly RK2. See Issue #72 for a discussion. I have elected not to fix this here because the empirically-determined model performance is tuned to the form of the evolution used in xija and fixing it would this require recalibrating all xija models.

jzuhone commented 4 years ago

ping @matthewdahmer @taldcroft

jeanconn commented 4 years ago

Awesome! Though I'm not sure if it counts as a bugfix per the PR title.

jzuhone commented 4 years ago

@jeanconn I personally consider that for our purposes the inability to make the same predictions at late times (to the precision afforded by the algorithm) to be a bug. I see what you mean though.

matthewdahmer commented 4 years ago

Thank you @jzuhone,

I have a few comments/questions:

1) To be sure I'm understanding this right, it looks like predictions calculated starting with odd 5min telemetry indices are not necessarily more accurate than predictions started on even indices, you are just suggesting we pick one in order to eliminate one potential source of prediction discrepency between the FOT and SOT load reviews.

2) Is the oscillation we see a product of not using the official RK2 algorithm, or would this exist when using the official RK2 algorithm as well?

3) I'm a little concerned always starting on an even index may have unforeseen complications with the Matlab FOT Tools as this could delay the start of temperature predictions in a review schedule in MCC by almost 2 x 328 seconds (such as if a schedule started 1 second after an even index). We should talk to James Kristoff to see if he has any insight.

matthewdahmer commented 4 years ago

@jzuhone,

Would it be feasible to determine which index the FOT tools is starting at (odd or even), based on a schedule start time (for the schedule under review), and then ensure the ACIS tools are also starting on the corresponding index type? This should have the same effect of ensuring our predictions are in phase, would avoid a Xija modification that does not improve performance or accuracy, and prevent any potential implementation issues on the FOT Tools side (such as accounting for the missing time at the beginning of a schedule). With respect to this last item, there are often state changes at the very beginning of a schedule, and I am unsure of whether or not this proposed Xija change would increase the chance some state changes may be missed or not accounted for properly.

On a related topic, I am not convinced changing the integration code would render all current models invalid. It seems that this should be something we should check before ruling out.

We are making a change on the FOT Tools side that allows us to update a model without requiring a new release, as long as the structure of the model does not change in a way that impacts the mating Matlab code (e.g. pseudo node names do affect the written code, parameter values do not). If any models do need to be updated as a result of a change to the Xija integration method, this FOT Tools Xija model integration process change will make implementing these new models simpler.

jzuhone commented 4 years ago

Hi @matthewdahmer,

At the meeting today I will show what I discovered late yesterday, which is that the model difference when starting at different times actually has nothing to do with the oscillations in the derivative. Fixing the RK2 integration in core.c fixes the derivative oscillations, but one still gets the model mismatch if one starts at an odd or even index (in point of fact the differences are even worse).

I'll also show how badly the DPA prediction performs if the model is not recalibrated after the fix to the RK2 algorithm.

jzuhone commented 4 years ago

We should also talk about the FOT process in this regard--the SOT process is to begin the model propagation ~a few days before the beginning of schedule, so that any errors arising from the initial condition have been smoothed out. For that reason starting 5 or 10 minutes later is no problem for us. But it sounds like that is not your process?

jzuhone commented 4 years ago

Figure showing 1DPAMZT propagation (blue is model with RK2 fix, orange is telemetry, purple is model without RK2 fix):

two_models

Figure showing derivative with RK2 fix, oscillations are gone, but differences in model runs still persist:

oscillations2

and the difference in model run:

two_models_good_diff

jeanconn commented 4 years ago

I think we've decided not to move forward with this and the PR can be closed.