Executing a single task multiple times w/ different params (aka 'parameterization')

tschijnmo commented 9 years ago

Hello everyone!

I recently found this nice invoke package and really like it. But there is a small problem for me if there is a recommended way to repeat a task with different parameters? For example, if we have got an input file template input_template, and we have got a set of parameters [1, 2, 3, 4, 5]. Maybe sometimes we might want to make a directory for each of the parameters, render the template in each of the directories for their value of the parameter, do something, and then loop over the subdirectories again to get something. Could you please help me if there is a recommended way to finish these kinds of tasks in invoke? If not, some pointers on some addition to invoke would be deeply appreciated. Then maybe I can finish it and make a pull request. Thank you! I could see that this can be achieved by making the processing for any of the parameters a task. And then make an overall task requiring a list of the individual tasks with different parameter values. But I am not quite sure if it is an elegant and correct way to do this.

PS If the looping over the subdirectories could be reused for both the file generation and result reading part, that would be a better one.

bitprophet commented 9 years ago

This is definitely a thing we want/need as it supports higher use cases like those in the SSH Fabric lib (2.0 is building on top of Invoke), e.g. "run this task over this list of remote hosts".

In that sense this is kind of related to #63 (also kinda #170).

bitprophet commented 9 years ago

Poking at this now, FTR noting that I had some (pretty old, haven't reviewed em yet) notes about this issue in some "TBD docs" which I just moved to a THOUGHTS file: https://github.com/pyinvoke/invoke/blob/ccd2bae21a94ceef93cba34441cf575d4ddb7ff9/THOUGHTS.rst#parameterizing-tasks

bitprophet commented 9 years ago

Also see https://github.com/fabric/fabric/issues/635 which I'll close and link here, preferring to move "fabric 2.0" tickets involving Invoke to this repo.

The tl;dr of that was basically just "make it easy to call a task N times with some iterable of contexts", and while at the time that ticket got written there was no Invoke/Contexts, I even used the word 'contexts'. Seems clear to me at this point that we really are just talking about unifying Context with the args/kwargs of a callable, and running through it multiple times.

Can't remember w/o checking it over but current Executor may be able to do this already, in which case we're really only talking about a method of 'expanding' some shorthand (at the CLI level, presumably, since at the library level it is literally just calling Executor methods - possibly with some more added for useful shorthand/syntactic sugar).

bitprophet commented 9 years ago

Also implied by those earlier Fabric tickets (incl https://github.com/fabric/fabric/issues/636 and a source level TODO in Invoke) is the need for this to alter how Executor.execute currently returns values, it's a simple dict keyed on task, which straight up doesn't work for non-deduped task runs (which parameterized tasks certainly count as).

bitprophet commented 8 years ago

(This is all note-to-self crap...)

A minor difficulty here is trying to mesh parameterization with pre/post task expansion & config generation.

Currently (docs), the executor:

expands tasks (adding pre/post tasks into the single list-of-tasks-to-execute)
then dedupes (removes multiple mentions of any given task)
then generates a configuration clone for each task (because only at the per-task level do we know which Collection to draw config values from)
& finally executes the task.

For this ticket's use case, we need to:

Figure out how exactly task expansion should play with parameterization.
- Do you parameterize the pre/post tasks too, treating pre+task+post as a single "unit"?
- Do you only run pre/post once and only parameterize the main task?
- etc - there's other tickets out there with these thoughts in them...
- For the first draft I'm going to treat pre/task/post as a "unit" just because it feels like the most common case.
Disable deduplication (which is easy, there is already an option/subroutine for it, so we will just not call it).
Bridge the expansion with the config step, because parameterized expansion requires us to inform each "copy" of the originally requested task what its particular value is for the parameter in question.
- E.g. for the Fabric case of "run task foo on hosts a,b,c", we need 3 calls to foo, one with a configuration or context reflecting that its host is a, one with b , etc.
- This requires moving config generation up into the expand-tasks step, or at least transmitting the parameterized info in some form.

There's actually two concerns with that last one - "pure" parameterization (altering the kwargs given to the literal task function) and "indirect" parameterization (altering the config/context object given to contextualized tasks - in Fab's case, this is a "connection" object, tho there's doubtless other use cases too).

"Pure" parameterization is easier because we can just transform tasks into Call objects earlier. Right now we allow either Tasks or Calls (which are basically Tasks w/ bound args/kwargs - think functools.partial) and Tasks get called without any args. I don't recall why we even allow that since it feels silly; it might be a concession to testing or something.

Altering the configs/contexts requires a bit more work but hopefully not a lot more - maybe expanding Call to care about those too, or just "compressing" the logic some so things aren't so split up (tho that's bad for inheritance and testing).

bitprophet commented 8 years ago

Yea, there's nothing that requires the config/context generation to happen just prior to execution, so we can move that part of things into the expansion step altogether.

All of the "is it a task or call" stuff feels redundant/old, we call a normalize function early which should turn everything into Call objects regardless.

bitprophet commented 8 years ago

Have things working well enough to roughly parameterize in Fabric across hosts, based on the (added in Fabric's Program instance) -H flag's value.

There still needs to be a generic way to do it in Invoke itself, though. I may not have time to finish pushing the functionality down into Invoke right away (partly because in Fabric's case the parameterization is the less-pure variant re: modifying contexts, and presumably the useful generic case is the pure one).

bitprophet commented 8 years ago

I apparently lied, currently my fab2 branch doesn't quite do multiple hosts correctly :) Hopefully I just broke it at some point. Need to beef up the test helpers for that stuff next!

Did some cleanup of the work for this Invoke branch here, today, including making Call a cleaner API and making call its own, literal convenience shim around that. There was some real dumb stuff left in there from last time.

bitprophet commented 8 years ago

After far too much futzing around, got fab -H host1,host2,host3 working as intended - most of the time was spent getting a useful multi-connection and multi-command-per-connection mock API & functionality working. Then actually fixing things up functionality-wise just required giving invoke.tasks.Call a .clone method.

Re: this ticket, what is left to do is identify an actual useful frontend API for pure, arg-based parameterization - the framework is all there in Call but it needs to be leveraged within Executor or a subclass. E.g. Fabric 2's subclass tweaks Executor.expand_tasks to perform the parameterization it needs - and the frontend for the parameterization is the core parser -H flag.

Invoke needs something similar and I don't actually have a great idea offhand since I don't have the use case myself much besides fab -H. If they're still interested in this, I wonder what @tschijnmo's specific use case is/was and what would work for them?

Gonna leave this open until that is solved but will merge the work I've already done & will be moving on to other things for now.

pyinvoke / invoke

Executing a single task multiple times w/ different params (aka 'parameterization') #228