spacetelescope / jwst

Python library for science observations from the James Webb Space Telescope
https://jwst-pipeline.readthedocs.io/en/latest/
Other
558 stars 164 forks source link

Ensure pipeline does not crash on unrecognized keywords #8335

Open stscijgbot-jp opened 6 months ago

stscijgbot-jp commented 6 months ago

Issue JP-3563 was created on JIRA by David Law:

For Build 11, we should look into future-proofing the pipeline by addressing some issues due to crashes resulting from unknown keywords.

Specifically, Build 10.2 development highlighted a couple of existing limitations:

Crashing can be useful, as it informs the user if they mistyped a keyword that they were trying to specify.  However, it also means that the pipeline is quite brittle, and old versions cannot handle newer parameter reference files if they introduced new keywords.

Rather, it may be better for the pipeline to instead log a warning that it doesn't know what to do with the entries, and to then ignore them.  Possibly with a few-second pause to draw attention to the warning?  Any thoughts on a better solution?

stscijgbot-jp commented 3 months ago

Comment by Melanie Clarke on JIRA:

This looks pretty easy to fix on the stpipe side:

spacetelescope/stpipe#153

This change just raises a warning for extra values in pipeline steps or parameters instead of an exception.  The extra values are ignored. 

 

I tested with modified versions of the linked detector1 pipeline parameters, adding a new step called "test_step", and with a modified version of the emicorr parameters here:

https://jwst-crds.stsci.edu/browse/jwst_miri_pars-emicorrstep_0002.asdf

to add a new parameter called "new_emi_parameter".  The output warnings at the top of the pipeline run look like this:

2024-05-28 14:25:01,803 - stpipe - INFO - PARS-DETECTOR1PIPELINE parameters found: jwst_ops/references/jwst/miri/jwst_miri_pars-detector1pipeline_0005.asdf

2024-05-28 14:25:01,816 - stpipe - WARNING - stpipe/src/stpipe/step.py:280: RuntimeWarning: Extra value 'test_step' in steps

  config_parser.validate(config, spec, root_dir=dirname(config_file or ""))

 

2024-05-28 14:25:01,816 - stpipe.Detector1Pipeline - INFO - Detector1Pipeline instance created.

2024-05-28 14:25:01,817 - stpipe.Detector1Pipeline.group_scale - INFO - GroupScaleStep instance created.

2024-05-28 14:25:01,817 - stpipe.Detector1Pipeline.dq_init - INFO - DQInitStep instance created.

2024-05-28 14:25:01,817 - stpipe - WARNING - stpipe/src/stpipe/step.py:280: RuntimeWarning: Extra value 'new_emi_parameter' in root

  config_parser.validate(config, spec, root_dir=dirname(config_file or ""))

 

2024-05-28 14:25:01,817 - stpipe.Detector1Pipeline.emicorr - INFO - EmiCorrStep instance created.

 

Nadia Dencheva - can you confirm that this fix is generally desired for stpipe?  If it needs to be JWST-specific, it might be a little more challenging to fix.

stscijgbot-jp commented 3 months ago

Comment by David Law on JIRA:

Melanie Clarke It is also worth inserting a pause for a few seconds in addition to the warning?  The log goes by really fast, so a 5-second pause could be helpful without affecting things too much.

stscijgbot-jp commented 3 months ago

Comment by Melanie Clarke on JIRA:

I'm not generally a fan of adding programmatic pauses, especially for pipelines that are more often run in an automated context than an interactive context.  Even when running manually, you only need to see the warning message once - pausing every time can be more of a hindrance than a help.  I think as long as the information is there, and the user can figure out from the output why the expected new step or parameter wasn't used, that's probably sufficient.

But that's more of a product owner decision, so maybe Howard Bushouse would like to weigh in?

stscijgbot-jp commented 3 months ago

Comment by Nadia Dencheva on JIRA:

It looks like Roman does not oppose making this change in stpipe.

That said I don't think we should include a pause in processing. There are many warnings issued by the pipeline and users should scroll up and look at the messages. There's no guarantee a user will notice the message even if the pipeline pauses. 

stscijgbot-jp commented 3 months ago

Comment by Brett Graham on JIRA:

David Law is there an example of a parameter reference file with an unknown keyword? Or even better, is there a ticket related to the resulting crash that explains the situation? I'd like to better understand if warning and continuing will generally make sense (for both jwst and roman).

stscijgbot-jp commented 3 months ago

Comment by David Law on JIRA:

Brett Graham It's a generic problem, but as an example you can force the crash by running the attached x1d parameter reference file on any MRS data cube with Build 10.2 or earlier.  This parameter reference file specifies a value for a new keyword 'ifu_covar_scale' which wasn't defined until Build 11.0, and as a result it crashes B10.2 with

ValueError: Extra value 'ifu_covar_scale' in root

Likewise, try running calwebb_spec3 on any Lvl3 association file with B10.2 and specifying --steps.pixel_replace.skip=False, which will crash with

strun: error: unrecognized arguments: --steps.pixel_replace.skip=False

since pixel_replace isn't part of the Spec3 pipeline until B11.0.

As a result of these crashes, we frequently can't deliver to CRDS the reference files/parameter reference files to use new keywords until months after the code itself is released as doing so would crash older versions of the pipeline.

stscijgbot-jp commented 3 months ago

Comment by David Law on JIRA:

Of course, crashing also has its uses; e.g., providing immediate feedback that a user mis-typed some input argument for a parameter that they wanted to change, so there's hopefully some balance that provides feedback on typos while not delaying deployment of new code/reference files for prolonged periods.

stscijgbot-jp commented 3 months ago

Comment by Melanie Clarke on JIRA:

Just a quick comment - running with --steps.pixel_replace.skip=False on the command line would still crash, even with the proposed changes.  The update to warn and continue is only for configuration files.

stscijgbot-jp commented 3 months ago

Comment by David Law on JIRA:

Gotcha, thanks for clarifying Melanie Clarke 

stscijgbot-jp commented 3 months ago

Comment by Brett Graham on JIRA:

Thanks for sharing the file. The file contains ifu_covar_scale: 1.8 which if I'm understanding https://github.com/spacetelescope/jwst/pull/8457/files would result in several error arrays being multiplied by this scaling factor. For a pipeline version that doesn't support that scaling is it acceptable to ignore the error, issue a warning message (which I don't believe would stop automatic processing) and generate a file using the un-scaled errors?

I am concerned that turning this sort of error generically into a warning will:

If I'm understanding the root issue (please correct me if I'm wrong), the goal is to be able to provide updated parameter reference files without breaking currently released cal versions. What about providing the cal software version to the parameters used for CRDS reference file selection and using this to determine what reference to use? Using the above ifu_covar_scale example, it looks like it's going to be part of the next version (lets say 1.15.0). Assuming cal_version (or something) is provided during CRDS reference selection a new reference file including ifu_covar_scale could be added with a cal_version>=1.15.0 requirement. Any older cal versions would continue to use the older reference files.

stscijgbot-jp commented 3 months ago

Comment by David Law on JIRA:

This starts to get at the larger related issue discussed by https://innerspace.stsci.edu/display/jwstdms/2024-05-22+DMS+Data+Release-style+processing+and+repro+for+JWST

Namely, there's a plan to move from the current scheme to one where CRDS contexts are tied to a given build by default.  I.e., the 1.14.0 would always grab a particular CRDS context by default, unless overridden by the user.  This would fix the issue from a DMS ops standpoint.  From an outside user perspective who's currently running 1.14.0 though, we'd want to allow them to keep running 1.14.0 with the latest reference files as much as possible.  They can't get the new covariance scaling functionality, but no reason they shouldn't be able to (say) take advantage of updated flatfields.

In this case that's how the ifu_covar_scale works; there's no correction made at the moment, so not applying the new parameter won't change things from how 1.14.0 already works.  (And is the same as running 1.15.0 without the parameter).  Generally, the thought is that new parameters relate to new functionality, and not using that new functionality should give the same performance as before.

Dealing with this via a cal software version selection within CRDS is an interesting suggestion.  The linked page above contains a discussion of how we're trying to figure out how to do this at the high level between CRDS default context and pipeline versions.  Doing so on a per-file basis would probably be a rather more fundamental change to the CRDS structure though.

stscijgbot-jp commented 3 months ago

Comment by David Law on JIRA:

How technically challenging would it be for the pipeline to only ignore unknown keywords in parameter reference files if those files were retrieved from CRDS?  At present there's a step at the very start of the pipeline that assembles all of the input parameters and prints them to the log (in what's now a much more easily readable format).  What if, at that stage, there were some function that looked at where the parameters were being read from and, if that was a path within the CRDS_PATH it simply scrubbed any unknown parameter, logged a warning message, and then didn't pass it on to the rest of the code?

That way we achieve the goal of having the automatically-pulled parameter ref files not cause crashes on anything unknown to older pipeline versions (could potentially be both keywords and even steps), while preserving the instant-feedback crashes when someone mistypes a parameter via either strun or in a notebook.  It puts the onus on us to get our delivered parameter reference files right, but that's nothing new.

braingram commented 3 months ago

Thanks for sharing the innerspace link. It's very helpful to see the wider context. I will review the page but my initial impression is that there is quite a bit that is still undecided about the new workflow. Given the uncertainties can we delay the change in stpipe until we have a more complete change? This would help to avoid issues as we develop a plan.

stscijgbot-jp commented 3 months ago

Comment by Melanie Clarke on JIRA:

David Law - something like that should be possible.  I can investigate.  Perhaps we can change the default behavior in stpipe back to throwing an error, and make it warn only when retrieving CRDS parameter files.

stscijgbot-jp commented 3 months ago

Comment by David Law on JIRA:

Great, thanks Melanie Clarke .  Brett Graham If it's possible to reach agreement on a good way to solve the problem I'd be keen to address it in 11.0 so that it has more time to percolate out to the community before we start making the other changes.  I.e., if the community is often a build or so behind, this gets the future-proofing started.

braingram commented 3 months ago

It will take me some time to get up to speed on the discussion but I'd like to try and summarize a few things to see if I have them right.

There are effectively 4 users of the cal code:

There are a few versions of the cal code to consider:

As DMS builds have a pinned context issuing a warning (instead of raising an exception) when a reference file contains an unknown parameter should have no impact (as long as it was caught during testing).

For testing the latest release for a build it seems beneficial to have this be an error (as it would indicate a mismatch between the candidate crds context and cal code).

For developers it also seems beneficial to have this be an error (as it will highlight the mismatch). One note here is that often CRDS updates which are incompatible with the pipeline are often caught when unit or regression tests start suddenly failing. These failures would not occur if a warning was issued. Having a mismatch produce an error is helpful in this situation.

This leaves external users. I want to say that we do not support older versions so we can restrict the conditions to an external user who is using the latest released version. At the moment with the "default" setup defined in the readme the user is told to define CRDS_PATH and CRDS_SERVER_URL. This will result in crds selecting the latest operational context (even if that context was not tested with the cal version they are using). This seems undesirable as having users use tested combinations of the context and cal code should lead to less issues. I'm not aware of any testing that runs the latest released cal version with the latest crds context on a regular basis or before CRDS updates.

At the moment I am of the mind that we should update the jwst readme to mention that the CRDS_CONTEXT environment variable should be set to a tested configuration in the table listed in the readme. This would effectively pin the context for external users allowing publishing of new CRDS contexts that would be seen by developers with only the expectation that these contexts would work with the development version of the code. One downside to this approach is that publishing a new CRDS context would not be seen by external users until the next release. At the moment I think this is preferable as it isn't until that release that the combination is fully tested.

stscijgbot-jp commented 3 months ago

Comment by Melanie Clarke on JIRA:

Looking into the possibility of warning only when retrieving CRDS parameter files...

This is possible to implement in stpipe, but I think it needs deeper changes than my original proposal.  I have a draft version that validates CRDS reference files after retrieving them, warning and stripping out any extra values in them before merging with downstream configurations.  That way, any subsequent validations (from user input config files, command line arguments, etc.) will throw errors for extra values.  Because there is a new validation step, though, default values get added to the configuration earlier than they do currently, and we'd need to make sure those don't have downstream effects.

I can push this draft up to my PR if folks would like to see/test it, but I'm unsure how this would fit in with Brett's concerns about running the pipeline in the various possible contexts.

Please let me know how you'd like me to proceed.

stscijgbot-jp commented 3 months ago

Comment by David Law on JIRA:

Brett Graham Thanks for your thoughts on this, it's really useful to have the discussion at this level of detail.  Here's what I'm thinking at the moment- I'll rephrase your summary a little, and in terms of some slightly different names.  (Not proposing that we have to use these internally, but for external purposes it could be helpful):

Users of the cal code:

I.e., most of the time, by default, external users would get a set of reference files that is curated to work with the pipeline version that they're running and allow them to reproduce data products they could find on MAST (once that build is in ops).  This is still somewhat TBD as part of the larger discussion, but that's my thought going in.

At the same time, external users could also set some flag in their pipeline run to use the latest CRDS context (or some specific newer one) instead.  This is slightly buyer-beware, there's no guarantee that the newer files will be compatible with the older pipeline, and the older the pipeline version gets the more like there is to be a problem.  However, most of the time over the course of a few months this can work ok.  Essentially, it's what we've got now where everyone always defaults to the latest context.  Since this is a way for users to keep up to date with reference file updates, we want to minimize incompatibilities to the extent that we can.  No reason to block users from the latest & greatest calibrations just because they can't take advantage of a minor code improvement to the uncertainties.

As such, it's really the case of external users who want to run with the latest files that I'm looking to deal with, while not causing issues for internal folks.

Melanie Clarke I'd be interested to see the PR to understand how it changes things, but if you have concerns about possible downstream effects on how parameters are getting set from this then I'd say that needs more careful study/testing to make sure we don't cause unintended problems.  Better for this to be 11.1 work than risk issues with 11.0.  Would be helpful to get input from both of you at the next meeting that Katie Kaleida schedules to discuss this.

 

stscijgbot-jp commented 3 months ago

Comment by Melanie Clarke on JIRA:

David Law - I pushed my changes up to the same PR on stpipe. (https://github.com/spacetelescope/stpipe/pull/153/files)

stscijgbot-jp commented 3 months ago

Comment by Melanie Clarke on JIRA:

David Law - is there anything else you'd like me to try for this ticket, or should we put it on hold for the next build, pending more discussion?

stscijgbot-jp commented 3 months ago

Comment by David Law on JIRA:

Melanie Clarke Let's hold it for the next build.  I'll be away much of the time between now and the close of 11.0, and it sounds like there are enough potential concerns that we shouldn't try to rush it through.

stscijgbot-jp commented 3 months ago

Comment by Melanie Clarke on JIRA:

Okay, sounds good.

stscijgbot-jp commented 1 week ago

Comment by David Law on JIRA:

Discussion about the method of implementing DR-style linking between jwst software and the corresponding CRDS reference file contexts has evolved significantly in the last few months, in large part as a result of detail discussions prompted by this ticket.  In the latest imagining (see https://innerspace.stsci.edu/display/CRDS/JWST+Data+Release) all pipeline versions (even old versions) will have default contexts with which they are guaranteed to work.  In order to get the latest reference files (and parameter reference files) users must also update their software versions to be sure of consistency.  Some parameter reference files, for instance, may be not cause crashes with prior versions but may nonetheless produce undesirable results.

As such, and given the concerns raised in comments above, I'm withdrawing this ticket.