Advanced task dependencies / caching

bitprophet commented 7 years ago

Use case description

Core desire is to strengthen relationships between tasks above/beyond what the existing pre/post functionality achieves; and to structure a nontrivial set of dependent state as a graph of tasks which only execute as many times as is required to set up that state.

Specific example is an existing nontrivial task tree around performing cloud automation, configuration management, and similar things. Many/most tasks in this tree are currently wrapped with a @state decorator whose body does a large amount of "heavy lifting":

processes pre-existing configuration directives, including aborting if they aren't defined
- A reminiscent predecessor is the old and (as far as I can tell) barely used fabric.api.require
instantiate network and API clients using that early configuration (and/or pulling from a secrets storage API)
generate new configuration settings based on runtime information (including data gathered from the above clients)
attempts at each step to perform idempotency, including skipping expensive operations if necessary (so no re-connecting to already-connected APIs; no re-reading already-read files; etc.)

At the least, this decorator "wants" to be split up into a bunch of smaller parts, each of which are connected by what state they themselves depend on, with the outermost/topmost caller (i.e. the decorated CLI task) specifying more precisely which bits they need. (This already exists in part with some args passed into the decorator, e.g. @state(limited=True) skips everything but the first few config bits and an early API client.)

Those smaller parts would then be capable of memoizing their results, so that e.g. if they are declared to "generate a value for config path c.aws.clients.vpc", and that config path already exists/is non-empty when something starts calling that function, it simply records this fact and short-circuits.

Going deeper?

Presumably one could expand this concept all the way to abstract things like "task A requires an admin cloud instance and a database" - including some sort of maybe registry-driven "find me some task that satisfies needs XYZ, cuz I need X Y and Z" setup. Which feels fraught and complex to me, and is probably a distinct API feel from the "do exactly what I say but feel free to skip anything you already did" described above.

Another approach could be something like @requires(file='/path/to/artifact', satisfied_by=build) where build is a task that presumably generates /path/to/artifact; this then becomes "just" a more rigorous @task(pre=[build]), and places the burden of the "is it already done?" test on the higher-level task instead of the lower-level one. Again, different API feel; and not the best, because if N tasks need /path/to/artifact they all have to specify this; and satisifed_by= implies that maybe M subtasks could generate the artifact, which feels specious.

Basically, each time I think of this side of it, I come back to: no, I really just want to specify that build_my_env depends on some half dozen other actual tasks, each of which wants to be memoizing or otherwise skipping already performed work. That gets us a very make-like setup, especially when you consider make is effectively just our existing pre-task functionality, except with the wrinkle that file paths "are" task names.

Solutions brainstorm

User-facing API

Note: was originally pondering additional decorators, but now think additional task kwargs makes more sense, see below comments. The name brainstorm is largely the same either way.

The actual needs are:

Ways to specify tasks that come before, and after, a task being defined.
Needs a plural noun ("this task has many _whatever_s"), a verb ("this task _something_s another task" or "that other task _something_s" this one") and a keyword argument (which is often a plural noun or a verb, but can also be some other part of speech, such as an adverb (afterwards=).
Those parts of speech should hopefully be thematically close; the farther apart they are, the more mental overhead is required.
Ideally we want shorter names and ones that are easy to type: 2-3 syllables is preferable over 4+, doubled-up letters aren't great, underscores aren't awful but are non-ideal, etc.

Specifying dependencies/prerequisites

before: Far too ambiguous re: whether the task being defined is the subject or object ("this other task comes before me" vs "I come before this other task".)
but_first: Very English but I think way over the line in terms of being cute/twee. Would pair well with and_then/then.
depend_on: meh
depends: just okay; shorter than depends_on but doesn't flow as well English-wise
depends_on: kinda works
dependencies: straightforward if slightly long. Also works great as double duty for the plural noun (though we'll probably use it for that regardless.)
first, as in, "first run these, then run me." Very English, possibly too cute, and possibly ambiguous in the sense that the term is pretty overloaded (though I can't immediately think of what else in an Invoke context one might confuse it with.) Really wants to pair up with then or and_then and maybe later.
needs: can't really explain why, but not a huge fan. Might work as plural noun and as verb though - a task "needs" another, a task "has needs", etc.
pre: i.e. how things work now. Somewhat ambiguous ("These others run pre-me" vs "I come 'pre' these others") and kinda ugly, but it is short.
preceded_by: Straightforward and unambiguous but kind of hell to type. Noun would be 'predecessors' presumably.
prerequisites: This is the term make uses. It's sensible but a bit long/annoying to type, and it only really pairs well with postrequisites which, while accurate, feels even more awkward for some reason. Works well as plural noun; verb's akward ("task A prerequires task B"?)
previously: "Previously, you should have run these tasks". Kinda awkward, this isn't TV.
prior: "You should run these prior to running me." Not awful, but also no good plural noun ("priors"? Awkward and legalese-sounding to me) and the 'verb' of "comes prior to" is a bit wordy.
requires: pretty solid, also works as verb, with "requirements" as the collective noun.
succeeds: Pairs well with precedes, but unfortunately an overloaded term (could be mistaken for "what to run if this task succeeds, as in, does not fail") so not great.

Specifying followups/postrequisites

after: too ambiguous, is it "Run these after me" or "Run me after these"?
after_success / after_failure: As commented below I'm not a huge fan of the conditional-trigger approach, but if we did implement that concept, these could work (the addition of success/failure removes the ambiguity of just after=.) Bit long though.
afterwards: Not bad, slightly cutesy, but unambiguous ("run me, then run these afterwards", but you'd never say "run me afterwards these other tasks".) None of the after* names have good plural nouns ("aftertasks"? Ugh) though the verb is easy, comes after.
and_then: Cutesy, but highly unambiguous, and not too long. Pairs well with first/but_first/etc.
consequences: Kinda dark (in English we usually use this to mean negative consequences) but clinically accurate? kwarg alternative could also be consequently=.
consumers: "These other tasks are consumers of what I do." Not the worst, but I feel it's not as general as should be, not all task relationships necessarily imply production/consumption of data, even if many do. Would work as plural noun and (via "consumes") verb. Might be the best plural noun (which is the real hard part of this side of things.)
drives (as in, "I drive execution of these other tasks"): mediocre at best; verb; but no good plural noun (uh..."drivees"? No. And don't even think about saying "passengers".)
enables: halfway decent but doesn't really cut it, because we're not saying "you can run these after me", we're trying to say "you must run these after me". Permission != command.
followup(s): currently what my doc brainstorm is using but I'm not in love with it. I'm using it as the plural noun as well ("followups" and/or "followup tasks"), but that's awkward. The whole family of follow* options has the same issue.
follow_with: better English but also longer
followed_by: even more natural-sounding than follow_with, though still long
later: Not the worst, tho not great. "Run these tasks later, after you run me." Maybe ambiguous ("I come later than these other tasks") but not as awful as some others. Pairs well with first.
leads_to: Half decent, is verb, no good obvious noun though.
next: Similar to later. "Do this next." (Also not accurate since there is no guarantee the requested tasks will actually run next!)
notifies: Like Chef, kinda, and acts as a kwarg and a verb, but sadly lacks a good forward-reference noun ("notification targets"? ugh).
post: i.e. how it works now. Same ambiguity, ugly, etc problems as with pre above.
post-tasks: same except more of a plural noun focus.
postrequisites: opposite of prerequisites, but as noted there, seems a bit long/awkward. Decent plural noun, and (via "postrequires") a very awkward verb.
precedes: Matches up with succeeds, but otherwise a bit awkward IMO. Is already the verb. Noun: uh...kind of wants to be "successors" really?
subsequently: More-Englishy adaptation of subsequents below ("run me, then subsequently, run these other tasks") but also a little twee. Same noun/verb as below.
subsequents: Slightly awkward but not the worst, works as a plural noun plus kwarg. Suppose the verb usage would be "is subsequent to"?
succeeded_by: Clear, but annoying to type on top of just being long. Verb would be "succeeds", as in, "these other tasks succeed this one".
successors: Similar to succeeded_by and also works as the plural noun. Verb same as above ("succeed"/"succeeds".) Also still awkward to type.
then: similar to and_then, but even shorter (but also even more cutesy.) Really wants to be paired with first, which I don't really like.
triggers: seems nice at a glance ("I trigger these tasks afterwards") but secretly ambiguous as it can also be interpreted as "these trigger me" and in fact demands to be seen that way if used as a plural noun. So, no good.

Skipping-execution checks

checks: almost certainly going with this as it's straightforward either way you read it: "these are my checks (check functions)" or "this task checks to see if these things are already satisfied".
creates: not great because so many state checks could be about things other than "did something get created".
generates: same as creates
supplies: awkward

Jeff's thoughts

The hardest part by far seems to be the plural noun for tasks which come after the current one, so we should start there.

The only ones that have come up so far that aren't problematic are: "after-tasks", "consequences", "post-tasks", "followups", "postrequisites", "subsequents", "successors". None of these are immediate "yeah!"s so let's see how they stack up re: the criteria listed up top.

"consequences":
- Thematically close to: nothing really? works equally well with most tho.
- "A's consequences", "B is a consequence of A", @task(consequences=[notify])
- Short? Not really
- Easy to type? Aside from length, it's only moderately awkward to type...
- Verdict: I don't really like it but it's not super bad.
"successors": regal
- Thematically close to: "predecessors"; also works to describe proceeds/succeeds (tho latter is ambiguous and thus not great)
- "A's successors", "B succeeds A", @task(successors=[notify])
- Short? 3 syllables, ok.
- Easy to type? Not really...
- Verdict: Feels like the awkward typing (especially of its only good related terms) kinda kills it
"subsequents": clinical
- Thematically close to: most half-decent plural nouns (dependencies, predecessors, prerequisites) are okay, but none work great.
- "A's subsequents", "B is subsequent to A", @task(subsequents=[notify])
- Short? 3 syllables, ok.
- Easy to type? Somewhat, tho not great.
- Verdict: Suspect still too awkward for everyday use but not 100% convinced of that.
"followups": folksy
- Thematically close to: not super close to any but "dependencies" seems to work fine?
- "A's followups", "B follows[ up] A", @task(followups=[notify])
- Short? 3 syllables
- Easy to type? Yes!
- Verdict: just okay.
"postrequisites": also clinical
- Thematically close to: "prerequisites" is a perfect match obviously
- "A's postrequisites", "B post-requires A", @task(postrequisites=[notify])
- Short? Nah, 4 syllables
- Easy to type? Nope.
- Verdict: meh.
"after-tasks": vaguely...Carroll-esque? idk
- Thematically close to: nothing is super close; pre-tasks, dependencies. Unfortunately neither of these "match" well as a logical inverse of "afterwards". The main English opposite would be "beforehand"? which seems too awkward.
- "A's after-tasks", "B comes after A", @task(afterwards=[notify])
- Short? 3 syllables
- Easy to type? Yes.
- Verdict: Not awful but still a bit silly.

Implementation

Naive version is to have a 'checker' aspect that confirms, each time, whether the desired result already exists. Check for existence of file, of config value, of cloud resource, etc.
- Could include a bunch of 'standard' versions of these, with the generalized case just being "any callable"
- Or some richer value, maybe, perhaps
As DAG for a given task exec grows and multiple sub(-sub-sub-sub)-tasks all depend on the same low level bits, that doesn't scale well because you're still e.g. hitting disk, remote system, API call, etc, dozens of times unnecessarily, even if the check action tends to be much faster than the "make the thing" action.
So we could formalize that "if the task has been called already this interpreter session, assume what it does is satisfied". Task already has basic (if not well trod) call count tracking so this could be easy to do.

Related, possibly subsumed tickets:

41 - the original dependencies ticket, long closed
45 - this is the closest match, and arguably a duplicate, though I interpret that ticket to be more "make the existing pre/post deduplicating use a DAG", whereas this ticket is about extending pre/post itself to have a richer API, even if that probably requires the DAG anyways.
100 - if we grow the ability to say "this task should generate ", it makes a ton of sense to slightly extend that to "this task should generate , regenerating if exists but is old"
170 - insofar as calling tasks from other tasks definitely wants to honor this new functionality
228 - dependency execution needs to mesh well with parameterization, their intersection is historically easy to screw up / paint oneself into a corner
261 - since this means doubling down on use of pre-tasks as a common thing, it means we really gotta figure out the signature mismatch problem between the main task and its pre-tasks (and their pre-tasks and ...)
298 (and/or its PR #299) - at least one user is using post-tasks, and noticed the existing behavior is unexpected re: overall flow of pre, main, post tasks. Something that should probably be solvable during this ticket.

bitprophet commented 7 years ago

More thoughts...

Call caching/tracking

Any non-empirical task call tracking should be double-checked for eventual inter-process parallelization safety as well as intra-process parameterization safety:

Inter-process isn't as big a deal yet since current parameterization (eg Fabric 2 hosts stuff) focuses on threading, since we're thread-safe; but we really don't want to paint ourselves into a corner.
Intra-process matters more immediately - e.g. if one has a host-oriented Fabric task which declares some dependency, and its dependency deduplication requires call tracking instead of state checking, we wouldn't want the state stored on the Call objects, but on the source Task object (or the Collection namespace obj). Otherwise each per-host Call would attempt to rerun its dependencies.
- Conversely, though, we don't want to limit use cases where a parameterized Task does want each generated Call to run its dependency (e.g. if the parameterization matters for the dependent task(s) as well as the main one - especially true in the "dependencies get a copy of the main task's args and/or context" case from #261

Decorator API

I'm now rethinking the "separate decorator" API option, for a couple reasons:

First, adding decorators besides @task creates a decorator-ordering problem - see e.g. how Fabric 1 struggled/struggles with that to this day, with folks putting @task above and/or below things like @roles, @hosts, @runs_once etc.
- Nontrivial mental overhead to remember which order is "allowed", especially for newbies, but even for intermediate users it's a footgun.
- Requires extra code if one wants to try avoiding some of the ordering issue (as Fab 1 did) - all decorators must be "aware" of each other, add attributes to the decorated object and/or look for those added attributes, make sure not to nuke them when transforming from func to Task, etc.
- Even with that extra code, it's not perfect, and now there's more opportunity for bugs. Such bugs are also harder to troubleshoot due to their nature (decorator-hosted info silently disappears, tasks are no longer created, etc.)
Extra kwargs in @task aren't, in my current experience, a huge deal either - many use cases only make use of 1-2 kwargs per task (often 0) so they don't really end up being big ugly globs of kwargs.
- And even if they did, it really is just an aesthetics thing - you're still talking maximum of 1 line per 'dimension', and for the more trivial args, they can often remain an easily readable 1-liner even w/ multiple in play (e.g. @task(default=True, dependencies=(foo, bar)))
Should I find compelling counterarguments to these, it's typically easier to grow an API than to shrink it.

bitprophet commented 7 years ago

Re: how to track the state: probably a good time to look at an actual graph lib. The one that Bruce identified and updated to be Python 3 compatible over in #45 is a good place to start, it's called dagger. Poking it briefly (including a skim of its docs):

It's explicitly file-oriented by default, which is good for folks who want that, but implies work needed for other use cases (which, for my current real world test case, is all of them)
Not very Pythonic code, oh well.
- Super duper not PEP-8 (lots of lowercase-named classes, one-line if foo: bar statements, etc)
- Defines its own iterator class that doesn't implement the iterator protocol (and which isn't returned by __iter__ on the main dag object)
- Lots of use of 0/1 instead of True/False
Effectively a single-file module, in terms of what is required for vendoring.
- Most files present in the repo are administrivia (benchmarks, docs, tests) or appear to be junk ('old' copies of files, a rendered copy of the sphinx docs, etc)
Primary API/flow:
- Create a dagger() object, with optional persistence options (eg store hashes in a text file, or in a sqlite db [optionally in-memory])
- Call dagger.add('filepath', ['dependencies', 'here']) to add nodes and their deps
- Tell it to scan the files for staleness and then evaluate the necessary required calls to get everything un-stale, via dagger.run()
- Obtain an iterator (dagger.iter())
- Iterate it with repeated calls to iterator.next() to walk the DAG (next() yields node objects with path info etc) and iterator.remove(name) (to mark nodes as "done"/"un-stale"/"satisfied")
Secondary API:
- Introspection:
- dagger.dump(): Textual linebreak-separated breakdown of all nodes & their current staleness/hash/etc (though not their dependency relationships)
- dagger.tree(): Another textual display, this one with just the relationships.
- dagger.dot(): Graphviz .dot syntax, w/ optional write-to-file for display via any compatible app (e.g. graphviz itself, dot -Tpng some.dot > some.png)
  - Also leverages node.format() which lets you customize the node identifiers/labels with eg name, basename, path, mod time, etc
- dagger.ordernames(): textual comma-separated list (i.e. sub-dep,other-sub-dep,dep,other-dep,top)
- dagger.order yields an idict object which is the underpinning of ordernames() and iter(), it exposes a regular old list accessed by .list, or a name->index mapping accessed by .dict
- Node manipulation/metadata:
- Can explicitly mark nodes (and thus, all of their dependents) as being out of date via dagger.stale(name). (This seems to need to be done before run(); i.e. the workflow is probably 'static' instead of derived, but given the emphasis on speed, this makes sense.)
- Can mark a node as "phony" (via dagger.phony(name)) which seems to just make it not bother performing file existence/date/hash check, or rather, allows one to override that default check.
  - Not 100% clear why that's useful (unless it's sort of an inverse of .stale()) but presumably something we'd have to tussle with to get non-file-related junk working, period
  - On inspection, yea, the only real file checking is in node.update() wherein there's a if self.phony: return short-circuit. So non-file-oriented nodes should be able to just be "phony" and that's that?
- dagger.exporthash writes out the hash database to disk (depending on the format specified at instantiation)
- Has the (IMHO) antipattern of "get(name) implicitly creates a new node and returns it"

bitprophet commented 7 years ago

So, after a source code skim, dagger is one of those classic "is it even worth saving myself <=375 SLOC if I am gonna feel compelled to tweak and/or subclass a bunch of stuff" gray area cases. Might as well use it to prototype for now but I reserve the right to just write an "inspired by" recreation if I start running into too much trouble.

bitprophet commented 7 years ago

Been wondering during all this if we can reasonably do away with post-tasks. They're a very minor use case compared to their inverse, have difficult naming, and don't fit neatly into the idea of a DAG: they're not part of the DAG at all but are part of the "body" of the top-level/being-invoked task, practically speaking. (And you can't phrase them as inverse dependencies, either, since the whole point is that they are not the focus of the execution.)

Further, most use cases for post-tasks seem like they hinge on success vs failure, which combined with the above, feels like they "should" be treated as part of #170 (tasks calling other tasks) and wrapped within try/except/finally blocks.

I scanned the tracker to see how many folks are filing tickets about post (vs pre-only) tasks and did find #298, so at least some users are using them - tho that is the only ticket I found.

It raises some things not noted prior:

The old deduplication was generally confusing re: how it treated post-tasks. tl;dr when post-tasks would show up multiple times, only the first one was kept, instead of only the last.

Corollary: even in a DAG, deduplication may want to be its own thing, because "lazy" or "elastic" pre/post tasks (what 'dependencies' generally mean - "run me at least once per session, anytime before/after the declaring task) are distinct from "rigid" ones (i.e. "run me once per declaring task" and typically also "run me immediately before/after the declaring task").

That said, I think it can still be argued that "rigid" pre/post tasks can/should be phrased within the body of the main task and don't benefit from the outer dependency/dedupe system (in fact they don't benefit from and significantly complicate such a system!)

bitprophet commented 7 years ago

If we assume we're gonna handle post-tasks, kind of want something that more closely matches the term dependency/depends/requires. Brainstorm:

triggers, i.e. execution of the main task triggers (eventual) execution of another afterwards
drives. Meh.
after is tempting, but also far too ambiguous due to lack of clear object/subject - does @task(after=[foobar]) mean "run foobar after you run this task" or "run this task after foobar runs"? It's not clear without prior knowledge that it's being used to implement this particular functionality.
- Could potentially use afterwards (@task(afterwards=foobar)) which is less ambiguous ("run me, then afterwards, run foobar) but still doesn't feel right to me, too awkward and/or cutesy?
later, as in, "run this task, then later, run foobar. Also awkward/cutesy.
Not sure what else works?

indera commented 7 years ago

Travis uses after_success or after_failure ... https://docs.travis-ci.com/user/customizing-the-build

bitprophet commented 7 years ago

@indera Indeed, and the thought occurred to me, though I suspect in most cases folks will desire to handle the after_failure case 'internally' with try/except/finally type logic as opposed to reimplementing that functionality within the execution system. We'll see - it's certainly possible to add the success/failure split later.

bitprophet commented 7 years ago

A TODO for myself to get this off the ground and stop dithering:

[x] Finish the DDD I'm writing in the conceptual docs. It's turning into a restatement of what was there before, but oh well. Still includes the new ideas...
[x] Figure out whether/how to deal with disabling dependencies, e.g. to ape package managers' --no-deps style behavior. - When writing the docs this definitely came up as a thing you might want to do sometimes, so we should do it.
[ ] Flesh out tests for all scenarios documented, plus any additional existing use cases we may not cover or which are broken, e.g. #298
[ ] Implement the DAG side of things, without otherwise touching @task or changing too much behavior - i.e. take the step of replacing the existing dedupe with dagger.
- In this case, I think we have to start out marking all nodes stale right after adding, and then remove them after they are executed.
[ ] Next is probably a good time to see about fixing #298, presumably by adding post-tasks to the runlist being fed to the DAG, as late as possible (instead of as early as possible)
- (A new day, a new brain) Can we simply phrase post-tasks as tasks added to the DAG as depending-on the actual runtime runlist?
- I don't really see another way of handling it besides running 2 or more DAG sessions (one for the main runlist, then another 1..N for any post-tasks) and that feels gross in comparison.
[ ] Write tests for task predicates/products/caching/memoization/whatever (still unhappy with the nomenclature here though I think calling the thing-being-tested-for the "product" of the task makes some sense)
[ ] Implement those
[ ] Done?

bitprophet commented 7 years ago

Grump, I was all set on @task(triggers=[call, me, after]) but realized it's also ambiguous - I meant it as "Calling me triggers calls to these other tasks", but it could be read as a plural of "trigger", implying "other tasks which, when called, trigger a call to me".

Maybe afterwards is best for now after all? Though it (& the rest I brainstormed above) lacks a useful descriptive word / collective noun to go along with the keyword... I should do some Twitter polls or something :)

EDIT: going with "followup tasks" and @task(followups=[...]) for the time being.

bitprophet commented 7 years ago

More name ideas stemming from a (side) twitter discussion:

Consider not cramming the same word into kwarg and term both, but using different ones
Maybe I should stick with @task(pre=[xxx], post=[yyy]) but have the terminology change from pre-tasks and post-tasks to prerequisites and postrequisites. Matches make to a degree which is both good and bad. Also just means more symmetry.
- Counter: on their own, pre=/post= do suffer from the same ambiguity as many other kwargs, am I saying "xxx comes before this task" or "this task comes before ('is pre-') xxx"?
The current "dependencies / followups" terminology I'm using in the doc still seems like it might strike best balance between (lack of) ambiguity, clarity, ability to describe them in English, etc. Will have to see.

offbyone commented 7 years ago

dependencies/consumers is more explicit.

@task(and_then=CONSUMERS) works, as a documentation slug and name thingy.

(As an aside, the tendency of kwargs' names and storage to share a name in languages that support them really makes this kind of API a pain. Objective-C does this a lot better.)

bitprophet commented 7 years ago

consumers isn't necessarily generic enough, though; the example use cases that come up a lot are things like cleaning artifacts, notifying external services, etc. They aren't necessarily consuming something produced by the 'main' task (as opposed to e.g. a link-compile or static-asset pipeline where "consumer" is definitely appropriate.)

(One could make the argument that e.g. a notification followup task is "consuming" the [empty] output of e.g. a test or build task, but that feels like a stretch to me.)

dependencies seems like an obvious slam dunk either way though. I'm even contemplating adding an alias or two for it, e.g. @task(requires=[other, tasks]), but so far I've tried to shy away from having too many "convenient" aliases.

ask commented 7 years ago

Radical, but if it was like this you don't even have to define functions elsewhere:


@task()
def compile():
    ...

@compile.before
def check_versions():
    ...

@compile.after
def cleanup_files():
    ...

bitprophet commented 7 years ago

What if I want check_versions to run before some number of other tasks, instead of just @compile, though? :grin: Then we end up with this:

@task
def compile(): ...

@task
def build(): ...

@task
def dryrun(): ...

# ...

@compile.before
@build.before
@dryrun.before
# ...
def check_versions(): ...

ask commented 7 years ago

Yeah, I guess out of the question if these are supposed to build a tree of tasks. You can call other tasks in these, but it will be impossible to introspect what the dependencies are. The good news I may take use of this pattern some time :)

bitprophet commented 7 years ago

Also just not sure how I feel about task objects being usable as decorators; it definitely makes for a neat-looking API, but I worry it goes too far into the magic zone with not enough benefit. Just a gut feeling though. (EDIT: but yea, I can definitely see other use cases where the benefits do outweigh the drawbacks, so, good luck :D)

ask commented 7 years ago

@bitprophet The task objects are not decorators, that'd be the composite @task.after etc.

class Task:

   def __init__(self) -> None:
     self.before = Callbacks(before)
     self.after = Callbacks(after)

class Callbacks(MutableSequence, Callable):

    def __call__(self, fun: Callable) -> Callable:
        self.append(fun)
        return fun

I wouldn't call it magic exactly, implementation is simple, and It's used in the stdlib with @property:

class X:
    _foo = None
    @property
    def foo(self):
        return self._foo

    @foo.setter
    def foo(self, value):
        self._foo = value

    @foo.deleter
    def foo(self):
         print('OOPS')

(Alas, I cannot argue that @property is good use of it.)

bitprophet commented 7 years ago

Ah right, good point. (Sorry, bouncing all over the place right now so even more scatterbrained than usual.)

My other point still stands unfortunately, I think it makes more sense in the general case for the declarations to live in/on the tasks making them instead of vice versa. This overall problem space, of course, often sees solutions going both ways (see eg Chef resources' notifies/subscribes) but I think I'd prefer to implement the more commonly useful variant first, and leave the option on the table to add the inverse later if enough people seem to want it.

bitprophet commented 7 years ago

Note to self: while writing out multiple examples using @task(followups=[a,b,c]), suspect it should really be @task(followup=[a,b,c]) instead.

I also still don't hate @task(afterwards=[a,b,c]), while still probably referring to a, b and c as "followup tasks"?

Also noting FTR that Twitter has yielded more "literate" style ideas for the kwargs, e.g.:

@task(then=[more, stuff])
@task(and_then=[do, more, things])
@task(but_first=[do, this], afterwards=[even, more])

My gut says these are slightly too cutesy, but you never know.

offbyone commented 7 years ago

Don't undersell it; there's definitely room for a sense of play in an API if the API is still usable. Especially if the literate approach yields clarity.

bitprophet commented 7 years ago

First pass at DDD is done, Github rendering of it is here: https://github.com/pyinvoke/invoke/blob/34c71cad54508579698bc5200dfaf3e65dd32eb5/sites/docs/concepts/execution.rst

Still says "followups" for now, I'll figure out what the final terminology should be after I actually prove an implementation works...

bitprophet commented 7 years ago

Each time I touch this stuff I find the API design parts irritating. Currently torn on whether to aim for multiple kwargs for the singular/plural case, or a single kwarg that behaves in a polymorphic fashion (accepts one object or an iterable of them).

E.g. @task(dependencies=[a,b,c]) + @task(depends_on=a), or...just @task(depends_on=[a,b,c]) + @task(depends_on=a). (I.e. in English, one can "depend on" a singular or plural noun, so...why not just go with that? Many more options here work in the "either-or" case than are purely singular or purely plural.)

Also still torn between depends_on and requires. (Either could work as a polymorphic kwarg.)

thebjorn commented 7 years ago

depends_on and requires read much better than dependencies (which is too long and has too many syllables).

Having an easy type signature will let IDEs help programmers spot errors, and it would prevent gymnastics if one of the required pre-requisites is an iterable, so I would urge always using a list, i.e. @task(requires=[a]).

afterwards doesn't seem too cutesy (or maybe on_success/on_error..? -- I don't recall if post_tasks run regardless of task success/failure).

bitprophet commented 7 years ago

Agree that shorter is better.

While I recognize your assertions about signature/gymnastics, I'm eternally on the fence about "always a list" because it's an annoying chore in the very common case of only ever wanting to throw a single value into it, and what's this sort of code for if not removing annoying chores? :grin: But again, I recognize the issues with polymorphism (or w/e it's properly called) which is why I wonder if there are any useful terms that strongly imply only the singular. Can't think of any really.

Then again - because we already do some mild gymnastics for the positional-argument use case, that makes 'single dependency' trivially easy (@task(my_dependency)) so perhaps it's moot. Not sure; explicit kwargs read nicely even in the trivial case.

Re: success vs error, my guiding principle right now is "if a given logic doesn't require the dependency system to achieve, it should not be implemented in that system at all", and I can't think of many scenarios where an on_error makes more sense than some in-task try/except/else/finally construct. Perhaps a "notify on fail", but even that is easily accomplished with a try/except that sets some config state, plus a regular followup task that interprets said state to figure out how/whether to notify. (And a "naive" notification task that isn't relying on any sort of state-passing, doesn't seem like it's very useful.)

"Normal" dependencies & followups make sense because it's not possible (certainly not easy) to achieve dependency deduplication or followup deferment without a call graph system; but anything where your logic dictates you want something to always happen, and immediately before/after the main task, seems like it should by rights live within the task body.

As always, though, this is an incremental change and I'm open to future changes/expansions.

bitprophet commented 7 years ago

Hrm. How about enables for followup tasks (@task(requires=[clean], enables=[notify]))? Not great, since "you can do X after you run me" is very different from "you must do X after you run me". But it's another one for the brainstorm pile, and it seems to match up well with requires at least.

Also, follow_with or followed_by might work, they're in the 3-syllable camp, are highly unambiguous in terms of subject/object, and feel less awkward than followups=.

offbyone commented 7 years ago

enables is an ordering constraint, but doesn't state that they will be executed after. It'd be useful for a statement that those tasks must follow this one, but not that they will.

bitprophet commented 7 years ago

Isn't that what I said? :D

bitprophet commented 7 years ago

Moar: while they feel too mouthy/awkward, specifying dependencies with preceded_by and followups with succeeded_by at least has the property of being symmetrical:

@task(preceded_by=[clean], succeeded_by=[notify])

Yea...definitely awkward to type.

Alternately, flip it around and specify dependencies with succeeds and followups with precedes? Bit easier to type (if still having the double-c and double-e. boy we're in the weeds now aren't we?)

@task(succeeds=[clean], precedes=[notify])

bitprophet commented 7 years ago

Spent the time to gather up all of the stupid words we've all brainstormed, and my personal thoughts/observations on 'em, and put them in the description in alpha order for shits n giggles. Please ping me if I missed your favorite 😛 EDIT: also tried to sprinkle in the plural-noun and verb angles where appropriate.

bitprophet commented 7 years ago

The more I stare at and/or enhance these lists the more I feel like a) the main limiter is the "tasks after this task" plural noun, everything else has half decent options, and b) the least-awful of those is still "followups". Further, I like the kwarg afterwards (esp vs followups) enough that I think it's worth the slight overhead of not being exactly the same as the plural noun.

So I guess I'll keep rolling with that for now, and also keep referring to the overall system as "dependencies" or "the dependency system" since I still suspect dependencies will always be the focus by far. Referring to everything as "dependencies/followups" feels needlessly strict.\

EDIT: also going with just-takes-iterables for both kwargs for the time being. Can always add in the bit of handwaving required for single objects later if it really bugs me 😜

bitprophet commented 5 years ago

Taking a stab at resurrecting this in a post-1.0 world. Sadly it means some of the "clean" changes now have to content with backwards compatibility, though I think that only really means keeping some arg/flag aliases around.

My old branch is nowhere near a clean merge due to a lot of the cleanup, file renaming & file consolidation that happened in the last year and a quarter (uggh) so I'm gonna have to do a lot of copy-pasting into a new branch or something. What needs doing:

[x] Port over the latest copy of the old execution doc into the modern invoking-tasks doc, so the DDD is preserved. Basically just load the latest versions of both up side by side and see what needs copying.
[x] Ditto the tests, which ought to be a little easier since it's mostly file renaming and then additions only.
- any modifications to existing tests should get preserved as having both, for backwards compat reasons
[x] Make sure backwards compatibility re: pre/post args is preserved (should still map cleanly enough to the new concepts)
[x] Ditto the --no-dedupe CLI flag, requires figuring out whether it should disable the newer system entirely, or if it should only impact the same aspects as it did before.
[x] Add deprecation and/or TODO 2.0: notes or similar re: removing the older args/flags
[x] Reread the new diff to make sure it's consistent & I'm still happy with the direction...
- if I'm unlucky it will feel like it "really wants" to stay backwards incompat, which would be problematic & require a rethink of how to cleanly implement it
- e.g. perhaps a new opt-in executor and/or program subclass users can enable to get the new jazz

Final TODO:

[x] Terminology (in docs and tests) seems left halfway between trying to rename tasks to targets (or at least to add a "target" concept on top of tasks, sort of like instances vs classes). Suspect we want to scrub "targets" for now, esp given this is being framed as a post-1.0 feature add?
[x] The conceptual docs need pre/post added back into them, even if just as a small separate "pre 1.3" section or something. - In conceptual docs it'd just be distracting noise; kept it limited to the @task API docstring.
[ ] Flesh out any tests where I put in "TODO: tests for x", only a few were added originally
[ ] Implement the main tests, etc, etc until it feels good & is usable for my internal PoC codebase.
[ ] Tests/design/impl for checks need implementing, including any obvious built-in ones (which probably just want to be regular old functions conforming to the desired interface)
[ ] Ensure all conceptual doc snippets actually function as written
[ ] Consider also spending the time tweaking a branch of an internal work codebase (config management esque) since that's an even bigger chunk of functionality to test this out.
[x] Add list of related closed ticket numbers to the changelog entry, for completeness.
[ ] If it was tackled, or makes sense to tackle, as part of wrapping this up: close #100
[ ] Ditto #299 - see if that quirk in post-tasks got incidentally solved or if it needs tweaks to solve
[ ] Check over any new TODOs added in diff from master; ensure they are handled or made into tickets.

bitprophet commented 5 years ago

In re-reading the docs I'm finding the hard requirement of an iterable value for depends_on/afterwards a little annoying/weird, especially since the singular check exists. Don't see that it'll make life that much harder to allow the former two to be iterable-or-callable; esp since they don't (currently) accept strings (though we probably want to pre-emptively guard against that anyways).

Kinda wish check/checks had a nicer single name we could do the same for, since then it feels different from its siblings. Might have to double check my old brainstorm. EDIT: those options are all bleh. New ideas: skip_if, guard/guards (same problem as check/checks tho), requires (too easy to confuse with the actual dependencies).

Alternately, remove 'checks' entirely in favor of some method of transmitting a "full stop" from a dependency; except that feels too convoluted, especially in the sense that we may well have a tree of tasks where we only want some subtrees to skip execution, not the entire session. Also may put too much control into the dependency instead of the dependent.

Could just say checks by itself, no check, since you could interpret it as "this task checks to see if it needs to run, via this callable/these callables" and then it works for singular or plural. But since it's also easily read as a plural of check, not sure that really works. Having the two different kwargs may just be a necessary evil. EDIT: what about just check? "Check this thing" or "Check these things"?

EDIT AGAIN ROFLMAO: actually, having an iterable of checks has its own problem: must all of the checks yield False to stop exec, or only one or more? There are going to be use cases for both options. Plus, users can work around it relatively easily by just having one check which calls N other checks; and we can add an iterable checks later after thinking on it harder, without breaking backwards compat.

pombredanne commented 4 years ago

ping :) @bitprophet is this dead dead or... worthy of a resurrection?

mohnen commented 4 years ago

I know that I am late to the show, but have a look at pyinvokedepends

pyinvoke / invoke