Sketch pipelines - Githubissues

warpfork commented 8 years ago

This is the first draft of mechanisms behind pipelining. Lots of the infrastructure in repeatr to date is about defining highly isolated pieces of work, then helping refine that definition of work until the results are effectively immutable. Now it's becoming time to start building in the other direction, as well: knitting pieces of work together, and making it possible to pump updates triggered by changing any one piece of data all the way through a series of transitions to produce something that's both new... and reproducible and auditable once we've charted the way.

To that end, this sketch introduces a new layer of configuration -- the Commission. Commissions are much like Formulas: they list inputs, actions to perform on them, and outputs to capture afterwards. The difference is that Commissions are allowed to refer to things by rough "names", which are human readable and mutable -- where Formulas refer to inputs by hashes like git commits, Commissions refer to their inputs by names comparable to git branches. Both layers are important: Formulas are completely repeatable descriptions of work because they continue to pin all inputs precisely; Commissions are less precise, but by producing Formulas from a Commission, we can get the best of both worlds.

The mapping from names to hashes is performed by another new structure called a Catalog (catalog.Book in the code). Catalogs list a series of names, and tell you which hash that name should resolve to. When you want to publish a new release of a product? Publish a new edition of the Catalog with that new hash. Commissions which consume that Catalog name will be automatically triggered to emit a new formula by...

... the Foreman! The Foreman is an actor upon a KnowledgeBase which contains a whole suite of related Commissions and Catalogs, and the Formulas they've produced and Wares they all reference. The Foreman listen for new Catalogs and Commissions, and evaluates them to produce Formulas... which then are scheduled to run on an Executor (this is the old familar turf, where we simply expect "formula in -> (hopefully deterministic) outputs out"). When the Executor returns results, the output wares may be fed back into releasing a new edition of a Catalog. This may continue to flow through a whole graph of dependent Commissions -- making it possible to update one ware and watch updated builds depending on it, and depending on things that depend on it, and so on... flow through the whole system automatically. :tada:

And many other miscellaneous bits:

The nil executor now has a bunch of modes. It can be configured to mock out deterministic or unreliable computations... this is useful for testing the whole Commission/Catalog/KnowledgeBase/Foreman flow while at no point requiring high privilege modes or the startup time required for actually running sandboxed worker processes.
A completely overwrought task leasing system in the Foreman, looking forward to a time when we can parallelize and farm out execution of formulas.
The first sighting of calculated content-addressable IDs for formulas themselves. We'll use these in the future for deduplicating work if the same formulas are generated by different commissions. They'll also be a large part of the auditability story in the future.

Other features hinted at in the future but as yet deferred to later rounds of drafts:

Catalogs are expected to be a focal point for signing. The whole datastructure is meant to provide not just single-instance release integrity, but enough information to describe update transitions. Also, ideally they'll be embeddable in public logs maintained and publicly monitored by others investing in binary transparency.
Knowledge bases are designed to be syncable. The Foreman is described as an actor for a reason: the knowledge base supports multiple concurrent actors, and one of the expected ones is a sync process that exchanges records with other repeatrs.
Catalogs represent "strong references" to wares of interest. This means we can soon think about building a well-defined "garbage collector"... so that you can run formulas freely, heap everything whimsically into your (content-addressable) storage as you go, and decide what to keep whenever it's time to trim the fat. Similarly this will enable smarter syncing and mirroring, etc.

There's no connection to the main() method yet -- no config, nothing -- this is still purely sketching, self-consistency testing, and a couple judicious but extremely visible duct-tape placeholders. But it is demoing multi-stage pipelines, automatically triggering evaluation between dependents in response to updates. So that's pretty cool. Enough to keep iterating on

timthelion commented 8 years ago

You know, everyone trashes on English for having crazy spelling rules, but I'd have to say that it's way more fun to abuse English spelling than in a phonetically spelled language like Czech :D. Still don't think it makes up for all those weekly spelling bees, though.

warpfork commented 8 years ago

Making fun of my opinionated, artisanal spelling of "evokation"? Hush, you!

timthelion commented 8 years ago

"artisanal spelling" :D

brb, I'm off to get my masters degree in mispronuciation.

warpfork commented 8 years ago

Merging to thunderous applause (cough) because I wanna get on with some refactors on master that'll make a real hash of this branch if it doesn't fold back in first. :)

polydawn / repeatr

Sketch pipelines #67