stpipe.Step methods run() and call() don't treat kwargs the same way

spacetelescope / jwst

Python library for science observations from the James Webb Space Telescope

https://jwst-pipeline.readthedocs.io/en/latest/

Other

566 stars 167 forks source link

stpipe.Step methods run() and call() don't treat kwargs the same way #1098

Closed jdavies-st closed 7 months ago

jdavies-st commented 7 years ago

Currently stpipe.Step methods run() and call() can be used to run a step.

run() The run() method (equivalent to __call__()) is used when you've already instantiated a step and now want to run it on the input data. This is used in all our pipelines. It only accepts *args. No *kwargs. Is there a reason for this?

It means that when I do the following:

mystep = jwst.dogreatstuff.dogreatstuff_step.DoGreatStuffStep()
result = mystep('myfile.fits', foo=True)

I get an error. Keyword arguments are not allowed once a step has been instantiated. Bug or feature?

call() The call() method is used when you want to instantiate and run a step in one line. So:

mystep = jwst.dogreatstuff.dogreatstuff_step.DoGreatStuffStep
result = mystep.call('myfile.fits', foo=True)

does work, as in the 2nd line above call() passes along the keyword arguments to the step when it is created and then runs the step with those arguments.

Would like to hear opinions on this discrepancy, as I don't understand the step/pipeline code well enough yet to see a reason for it. @stscijgbot-jp

hbushouse commented 7 years ago

Who's got Mike Droettboom's email address? He has the answers to this.

nden commented 7 years ago

I think this makes sense. Perhaps the naming is a bit confusing.

result = mystep.call('myfile.fits', foo=True)

creates a new step, with an attribute foo, for example, new_step.foo and then executes new_step.run(*args) In this case foo is picked up because it's an attribute of new_step.

To turn your example around, the above is equivalent to

mystep = Mystep(foo=True)
mystep(*args)

jdavies-st commented 7 years ago

Yeah, your example above works well. But the pipeline infrastructure instantiates all the steps of a pipeline that are defined in the step_defs dict defined at the beginning of the pipeline class. Like

    step_defs = {
        'assign_wcs': assign_wcs_step.AssignWcsStep,
        'flat_field': flat_field_step.FlatFieldStep,
        'photom': photom_step.PhotomStep,
        }

And then in the pipeline itself one usually writes

result = assign_wcs(input)

to actually run the step. If one tries to pass foo=True at this point, one gets an error, as run() doesn't accept keyword arguments. One could do

result = assign_wcs_step.AssignWcsStep.call(input, foo=True)

in the pipeline, but then a new instance of that class is created and run, and this I think bypasses anything that is passed from the pipeline config or keywords, i.e. skip=True for a step.

Hmm.

stscieisenhamer commented 7 years ago

Doing some source code/mind reading, the reason is a direct result of the design: gather and verify all needed external resources, then run the pipeline. The run method implements the second part of "running the pipeline/step".

The problem is how to do the first part of "gathering the resources". Almost all of the work is done in the __init__, but the operative word is almost. For whatever reason, the merging of config file and keywords is not done in the __init__. In fact, nothing with-respect-to the configuration file is done there; it is only used to determine path information. The gathering of configuration parameters is all handled in the call method.

So, the question, IMHO, is not why call and run are different; they are different by explicit design, and for very definite reasons. My question is why not put the configuration gathering in the __init__ as opposed to the factory method call. More code diving may reveal that reason. I would guess, and would tend to agree, that it is more flexible to invoke a factory method as the main entry point.

hbushouse commented 7 years ago

But if the configuration gathering is moved to the __init__, won't it then get executed at the time each step is declared/instantiated up at the top of a pipeline module, and hence still make it impossible to include keywords at the point where the step is actually called in the body of the pipeline? Or does the __init__ get executed at the point where the step is called?

jdavies-st commented 6 years ago

I would like to revisit this difference between run and call in stpipe as it has come up again amongst our INS testers.

My feeling is that there should not be two separate methods to run a step or pipeline, or if there must be, they should work almost the same way.

Is there a good reason that run should not take a config file or any other kwargs?

I also do not like that run is mapped to __call__. Then why is the other method named call?

It's confusing both for us developers and especially for users. Is there a way we can simplify this?

aliciacanipe commented 4 years ago

Just a general comment to revive this -- I've had a couple questions from INS testers lately about why the run method doesn't accept config files. Community focus is shifting to the pipeline and related documentation these days, so this problem may keep coming up.

nden commented 3 years ago

@stscijgbot-jp

stscijgbot-jp commented 2 years ago

This issue is tracked on JIRA as JP-1864.

stscijgbot-jp commented 2 years ago

Comment by Edward Slavich on JIRA:

I created an issue on the stpipe repo with a summary of how this works currently and some ideas on what we might change.

stscijgbot-jp commented 7 months ago

Comment by David Law on JIRA:

This ticket no longer seems relevant, given the evolution in parameter reference files and the ability of both call() and run() to use them. There are still potentially discussions to be had about improvements, but those can happen elsewhere.