spacetelescope / jwst

Python library for science observations from the James Webb Space Telescope
https://jwst-pipeline.readthedocs.io/en/latest/
Other
562 stars 167 forks source link

Calling a pipeline without specifying a method should use call() #8130

Open stscijgbot-jp opened 10 months ago

stscijgbot-jp commented 10 months ago

Issue JP-3490 was created on JIRA by Bryan Hilbert:

Currently, when calling a pipeline (or step?) without specifying a method e.g.:

from jwst.pipeline.calwebb_image2 import Image2Pipeline p = Image2Pipeline('my_file_asn.json')

the pipeline will run using the run() method. Given that the run() method is not the preferred way to run the pipeline, it might be better/safer to have the pipeline in this situation default to using the call() method instead.

I'm not sure if the situation is the same for individual steps, but the same logic applies there I think.

stscijgbot-jp commented 9 months ago

Comment by Melanie Clarke on JIRA:

We recently discussed this on the NIRSpec team as well.  Most of our internal users were surprised to hear that the .run method is not advised, since many demonstration notebooks use that syntax.  It is also the more natural syntax – most users preferred setting parameters in a Step or Pipeline object via attributes to setting them in a dictionary to provide to the .call method.  It was very surprising to users both that the run method works without warning, and that the call method quietly ignores any parameters previously set in Step or Pipeline attributes.

I think this is a design flaw in the pipeline infrastructure that can't be addressed with documentation.  It is already documented in several places that .call should be used in place of .run, but having both available continues to confuse users and cause subtle problems with data reductions.

I can think of several possible approaches to fix this:

I think the last approach is preferable, since it would be least disruptive and least surprising to users.

bhilbert4 commented 9 months ago

There is a way to do the last approach. David Law's MIRI MRS notebook gets the CRDS information, and then instantiates the pipeline, sets the parameters as attributes, and then calls the run method. He has this wrapped up in a function called rundet1().

    crds_config = Detector1Pipeline.get_config_from_reference(filename)
    det1 = Detector1Pipeline.from_config_section(crds_config)

    det1.output_dir = outdir # Specify where the output should go

    det1.jump.maximum_cores='half'
    det1.ramp_fit.maximum_cores='half'
    # This next parameter helps with very bright objects and/or very short ramps
    det1.jump.three_group_rejection_threshold=100

    # Enable detection of large cosmic ray showers (currently only works for FASTR1 data)
    det1.jump.find_showers=True

    det1.save_results = True # Save the final resulting _rate.fits files
    det1(filename) # Run the pipeline on an input list of files
stscijgbot-jp commented 9 months ago

Comment by David Law on JIRA:

While the get_config_from_reference function works for our own notebooks, I'd be interested to see if there's a way to make Melanie Clarke 's third approach work more generally.  Is it possible and/or desirable for both .call() and .run() to check for parameter defaults automatically?

stscijgbot-jp commented 9 months ago

Comment by Howard Bushouse on JIRA:

Theoretically pretty much anything is possible, e.g. updating the .run() method to look for pars ref files, but I think that would then be defeating the original purpose of having 2 different methods in the first place and would essentially make them redundant with one another. The .run method is preferable when you want to execute something multiple times using the same param settings, while .call starts over from scratch each time it's called. IMHO, it would be nice to simply update .call so that you can give it param/args as easily as you can with .run (i.e. not having to create the cumbersome dictionary structure).

Sounds like a good topic for discussion in a DMS WG meeting.

stscijgbot-jp commented 2 months ago

Comment by David Law on JIRA:

Adding a note from the July 10 JP meeting that the cleanest solution may be to deprecate usage that doesn't specify either .run() or .call(), adding a deprecation note now and actually removing that functionality some time next year.  Potentially add a warning note for .run() as well.

stscijgbot-jp commented 1 month ago

Comment by David Law on JIRA:

Followup discussion at the Sep 3 JP meeting; we should print a deprecation warning now (https://jira.stsci.edu/browse/JP-3728) that in a future TBD build usage without specifying either .call() or .run() will be deprecated.