Closed jdavies-st closed 7 months ago
Who's got Mike Droettboom's email address? He has the answers to this.
I think this makes sense. Perhaps the naming is a bit confusing.
result = mystep.call('myfile.fits', foo=True)
creates a new step, with an attribute foo
, for example, new_step.foo
and then executes new_step.run(*args)
In this case foo
is picked up because it's an attribute of new_step
.
To turn your example around, the above is equivalent to
mystep = Mystep(foo=True)
mystep(*args)
Yeah, your example above works well. But the pipeline infrastructure instantiates all the steps of a pipeline that are defined in the step_defs
dict defined at the beginning of the pipeline class. Like
step_defs = {
'assign_wcs': assign_wcs_step.AssignWcsStep,
'flat_field': flat_field_step.FlatFieldStep,
'photom': photom_step.PhotomStep,
}
And then in the pipeline itself one usually writes
result = assign_wcs(input)
to actually run the step. If one tries to pass foo=True
at this point, one gets an error, as run()
doesn't accept keyword arguments. One could do
result = assign_wcs_step.AssignWcsStep.call(input, foo=True)
in the pipeline, but then a new instance of that class is created and run, and this I think bypasses anything that is passed from the pipeline config or keywords, i.e. skip=True
for a step.
Hmm.
Doing some source code/mind reading, the reason is a direct result of the design: gather and verify all needed external resources, then run the pipeline. The run
method implements the second part of "running the pipeline/step".
The problem is how to do the first part of "gathering the resources". Almost all of the work is done in the __init__
, but the operative word is almost. For whatever reason, the merging of config file and keywords is not done in the __init__
. In fact, nothing with-respect-to the configuration file is done there; it is only used to determine path information. The gathering of configuration parameters is all handled in the call
method.
So, the question, IMHO, is not why call
and run
are different; they are different by explicit design, and for very definite reasons. My question is why not put the configuration gathering in the __init__
as opposed to the factory method call
. More code diving may reveal that reason. I would guess, and would tend to agree, that it is more flexible to invoke a factory method as the main entry point.
But if the configuration gathering is moved to the __init__
, won't it then get executed at the time each step is declared/instantiated up at the top of a pipeline module, and hence still make it impossible to include keywords at the point where the step is actually called in the body of the pipeline? Or does the __init__
get executed at the point where the step is called?
I would like to revisit this difference between run
and call
in stpipe
as it has come up again amongst our INS testers.
My feeling is that there should not be two separate methods to run a step or pipeline, or if there must be, they should work almost the same way.
Is there a good reason that run
should not take a config file or any other kwargs
?
I also do not like that run
is mapped to __call__
. Then why is the other method named call
?
It's confusing both for us developers and especially for users. Is there a way we can simplify this?
Just a general comment to revive this -- I've had a couple questions from INS testers lately about why the run method doesn't accept config files. Community focus is shifting to the pipeline and related documentation these days, so this problem may keep coming up.
@stscijgbot-jp
This issue is tracked on JIRA as JP-1864.
Comment by Edward Slavich on JIRA:
I created an issue on the stpipe repo with a summary of how this works currently and some ideas on what we might change.
Currently
stpipe.Step
methodsrun()
andcall()
can be used to run a step.run()
Therun()
method (equivalent to__call__()
) is used when you've already instantiated a step and now want to run it on the input data. This is used in all our pipelines. It only accepts*args
. No*kwargs
. Is there a reason for this?It means that when I do the following:
I get an error. Keyword arguments are not allowed once a step has been instantiated. Bug or feature?
call()
Thecall()
method is used when you want to instantiate and run a step in one line. So:does work, as in the 2nd line above
call()
passes along the keyword arguments to the step when it is created and then runs the step with those arguments.Would like to hear opinions on this discrepancy, as I don't understand the step/pipeline code well enough yet to see a reason for it. @stscijgbot-jp