Closed liamhuber closed 9 months ago
Cheers, I'll take a careful look over the next week. If it's urgent (stopping other work), feel free to merge w/o review, we can always revert :)
Cheers, I'll take a careful look over the next week. If it's urgent (stopping other work), feel free to merge w/o review, we can always revert :)
Super! No, this is pretty standalone so I should be fine leaving it open for O(week).
Some minor feedback - but otherwise lgtm. Let's get this merged :)
In the long run we want to have queue submission fully integrated with nodes; in the short term, we can get that by making a
pyiron_base
job out of a node. This isn't a permanent solution as it doesn't allow, e.g., sending just one node from a graph off to the queue, only entire graphs (even if they're just one node themself).Here I introduce two solutions: one defines a new subclass of
TemplateJob
and relies on the node's own storage capabilities to (de)serialize itself (from)to HDF inside the job's working directory. The only trueinput
on the job is a string saying what storage backend the node should use ("h5io"
/"tinybase"
, per #160); then the user supplies a node instance to the job's.node
attribute. The executed node can be found in the same place on the job on reloading.The other exploits the new
pyiron_base.Project.wrap_python_function
, which takes a node injob.input["node"]
and returns the executed version injob.output["result"]
using the wrapper'scloudpickle
and ignoring the nodes own storage.Here's an example of the first approach copied from the docstring:
The docstrings and tests are relatively thorough, but I'm not keen to document it in the tutorial notebooks or expose the functionality directly on the
Workflow
class yet -- I'd rather if some people at MPIE played around with it a bit and let me know what would be helpful, which interface they prefer, etc.The one big constraint in this PR is that the nodes will still only run on a single core, even when shipped off to a queue. This is because the node's
.executor
attribute is (so far)None
or an instance of an executor -- and the later can't be easily serialized. @jan-janssen is working on this for the underlyingPythonFunctionContainerJob
, but we may also be able to solve it independently for theNodeJob
here using the same attack -- namely, supplying information on how to instantiate a fresh executor on the far side instead of passing along a live instance. We already have the infrastructure in place to do this in theNode._parse_executor
method, it's simply a matter of populating that method with the logic to parse a class name and args/kwargs for the new executor.@ligerzero-ai, IIRC you were going to look at running multi-core Vasp jobs in a workflow? This should let you do it on the queue, so basically you just need to write parser nodes, and a shell node (preferably a generic one, and a more specific one that executes VASP in particular), then put them all together in a macro. @JNmpi has examples of these things for other codes over on his branch (#33), so we're not too far off. I don't expect you to do anything with this immediately, I just want to stay mutually well informed so we don't wind up doubling up on work. If you do want to work on it, maybe you could first/also open a PR to extend
_parse_executors
?Note that this PR depends on pyiron_contrib and h5io code that isn't merged to main yet, much less released, so none of it will run unless you clone those branches locally. This is also why the CI tests all fail.