pyiron / pyiron_workflow

Graph-and-node based workflows
BSD 3-Clause "New" or "Revised" License
12 stars 1 forks source link

Add and test wrappers for sticking nodes in a pyiron job #189

Closed liamhuber closed 9 months ago

liamhuber commented 9 months ago

In the long run we want to have queue submission fully integrated with nodes; in the short term, we can get that by making a pyiron_base job out of a node. This isn't a permanent solution as it doesn't allow, e.g., sending just one node from a graph off to the queue, only entire graphs (even if they're just one node themself).

Here I introduce two solutions: one defines a new subclass of TemplateJob and relies on the node's own storage capabilities to (de)serialize itself (from)to HDF inside the job's working directory. The only true input on the job is a string saying what storage backend the node should use ("h5io"/"tinybase", per #160); then the user supplies a node instance to the job's .node attribute. The executed node can be found in the same place on the job on reloading.

The other exploits the new pyiron_base.Project.wrap_python_function, which takes a node in job.input["node"] and returns the executed version in job.output["result"] using the wrapper's cloudpickle and ignoring the nodes own storage.

Here's an example of the first approach copied from the docstring:

        >>> from pyiron_base import Project
        >>> from pyiron_workflow import Workflow
        >>> import pyiron_workflow.job  # To get the job registered in JOB_CLASS_DICT
        >>> 
        >>> @Workflow.wrap_as.single_value_node("t")
        ... def Sleep(t):
        ...     from time import sleep
        ...     sleep(t)
        ...     return t
        >>> 
        >>> wf = Workflow("pyiron_node", overwrite_save=True)
        >>> wf.sleep = Sleep(0)
        >>> wf.out = wf.create.standard.UserInput(wf.sleep)
        >>> 
        >>> pr = Project("test")
        >>> 
        >>> nj = pr.create.job.NodeJob("my_node")
        >>> nj.node = wf
        >>> nj.run()
        >>> print(nj.node.outputs.to_value_dict())
        {'out__user_input': 0}

        >>> lj = pr.load(nj.job_name)
        >>> print(nj.node.outputs.to_value_dict())
        {'out__user_input': 0}

        >>> pr.remove_jobs(recursive=True, silently=True)
        >>> pr.remove(enable=True)

The docstrings and tests are relatively thorough, but I'm not keen to document it in the tutorial notebooks or expose the functionality directly on the Workflow class yet -- I'd rather if some people at MPIE played around with it a bit and let me know what would be helpful, which interface they prefer, etc.

The one big constraint in this PR is that the nodes will still only run on a single core, even when shipped off to a queue. This is because the node's .executor attribute is (so far) None or an instance of an executor -- and the later can't be easily serialized. @jan-janssen is working on this for the underlying PythonFunctionContainerJob, but we may also be able to solve it independently for the NodeJob here using the same attack -- namely, supplying information on how to instantiate a fresh executor on the far side instead of passing along a live instance. We already have the infrastructure in place to do this in the Node._parse_executor method, it's simply a matter of populating that method with the logic to parse a class name and args/kwargs for the new executor.

@ligerzero-ai, IIRC you were going to look at running multi-core Vasp jobs in a workflow? This should let you do it on the queue, so basically you just need to write parser nodes, and a shell node (preferably a generic one, and a more specific one that executes VASP in particular), then put them all together in a macro. @JNmpi has examples of these things for other codes over on his branch (#33), so we're not too far off. I don't expect you to do anything with this immediately, I just want to stay mutually well informed so we don't wind up doubling up on work. If you do want to work on it, maybe you could first/also open a PR to extend _parse_executors?

Note that this PR depends on pyiron_contrib and h5io code that isn't merged to main yet, much less released, so none of it will run unless you clone those branches locally. This is also why the CI tests all fail.

github-actions[bot] commented 9 months ago

Binder :point_left: Launch a binder notebook on branch _pyiron/pyiron_workflow/as_pyironjob

ligerzero-ai commented 9 months ago

Cheers, I'll take a careful look over the next week. If it's urgent (stopping other work), feel free to merge w/o review, we can always revert :)

liamhuber commented 9 months ago

Cheers, I'll take a careful look over the next week. If it's urgent (stopping other work), feel free to merge w/o review, we can always revert :)

Super! No, this is pretty standalone so I should be fine leaving it open for O(week).

ligerzero-ai commented 9 months ago

Some minor feedback - but otherwise lgtm. Let's get this merged :)