pyiron / pyiron_workflow

Graph-and-node based workflows
BSD 3-Clause "New" or "Revised" License
11 stars 1 forks source link

[patch] Post-facto provenance #288

Closed liamhuber closed 5 months ago

liamhuber commented 5 months ago

This introduces post-facto provenance to executed composites (both macros and workflows) in the form of child label lists. @JNmpi, this closes a feature gap we had with other workflow managers.

There are separate lists for the execution order and the completion order, since these can differ under certain conditions where an executor is used for parallelism and the time required for different children differs significantly. this feature and the basic syntax are shown below:

from time import sleep

from pyiron_workflow import Workflow

@Workflow.wrap.as_function_node()
def Slow(t):
    sleep(t)
    return t

@Workflow.wrap.as_macro_node()
def Provenance(self, t):
    self.fast = Workflow.create.standard.UserInput(t)
    self.slow = Slow(t)
    self.double = self.fast + self.slow
    return self.double

wf = Workflow("provenance")
wf.time = Workflow.create.standard.UserInput(2)
wf.prov = Provenance(t=wf.time)
wf.post = wf.prov + 2

with Workflow.create.Executor(max_workers=2) as exe:
    wf.prov.fast.executor = exe
    wf.prov.slow.executor = exe
    wf()

print("wf by execution", wf.provenance_by_execution)
print("wf by completion", wf.provenance_by_completion)
print("macro by execution", wf.prov.provenance_by_execution)
print("macro by completion", wf.prov.provenance_by_completion)
>>> wf by execution ['time', 'prov', 'post']
>>> wf by completion ['time', 'prov', 'post']
>>> macro by execution ['t', 'slow', 'fast', 'double']
>>> macro by completion ['t', 'fast', 'slow', 'double']

Under the hood, nodes without parents behave just like they always did: when they finish, they emit their ran signal, which triggers downstream nodes. However, nodes with parents now allow their parent to manage execution by interacting with the parent's register_child_starting, register_child_finished, and register_child_emitting_ran interfaces, which control the provenance lists as well as lists of nodes currently running and of signal pairs waiting to be fired. Running a composite then has a while loop over both running nodes and signal pairs having lengths >0.

This has a nice side-effect of shortening the stack when an error is encountered during the run, since children always just go up to their parent and then stop, instead of having to trace back through every node. This resolves the recursion limit bug with while-loops and closes #247. (The provenance for while-nodes is sensible, but not super informative -- just their children again and again and again.)

The only thing I really don't like is Node.pull in the presence of a Workflow parent required some nasty hacking to get Workflow to stop automating the flow and ruining the careful orchestration of execution signals that pull uses to run only the relevant portion of the upstream graph. I don't see a more clever solution right now, and so I'm very open to suggestions for improvement, but am willing to tolerate a few lines of ugliness in Node.pull to get the entire feature off the ground.

Non-goals:

github-actions[bot] commented 5 months ago

Binder :point_left: Launch a binder notebook on branch _pyiron/pyironworkflow/provenance

codacy-production[bot] commented 5 months ago

Coverage summary from Codacy

See diff coverage on Codacy

Coverage variation Diff coverage
:white_check_mark: +0.12% (target: -1.00%) :white_check_mark: 93.10%
Coverage variation details | | Coverable lines | Covered lines | Coverage | | ------------- | ------------- | ------------- | ------------- | | Common ancestor commit (920dd1c894f35234c226e4af7465f709c7a76079) | 3474 | 3043 | 87.59% | | | Head commit (8aff2081f1d749b50ba6238d06a85160fddc1410) | 3525 (+51) | 3092 (+49) | 87.72% (**+0.12%**) | **Coverage variation** is the difference between the coverage for the head and common ancestor commits of the pull request branch: ` - `
Diff coverage details | | Coverable lines | Covered lines | Diff coverage | | ------------- | ------------- | ------------- | ------------- | | Pull request (#288) | 58 | 54 | **93.10%** | **Diff coverage** is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: `/ * 100%`

See your quality gate settings    Change summary preferences

Codacy will stop sending the deprecated coverage status from June 5th, 2024. Learn more

coveralls commented 5 months ago

Pull Request Test Coverage Report for Build 8696625574

Details


Files with Coverage Reduction New Missed Lines %
pyiron_workflow/composite.py 1 99.57%
node.py 6 94.18%
composite.py 13 93.14%
<!-- Total: 20 -->
Totals Coverage Status
Change from base Build 8652695857: 0.2%
Covered Lines: 6569
Relevant Lines: 7161

💛 - Coveralls