pyiron / pyiron_workflow

Graph-and-node based workflows
BSD 3-Clause "New" or "Revised" License
12 stars 1 forks source link

A first pass at storage #160

Closed liamhuber closed 9 months ago

liamhuber commented 10 months ago

@samwaseda, @JNmpi -- saving and loading is working! Needs this branch, and the tinydata branch on contrib.

@pmrv, before I dig deep and start fleshing out tests and examples, I would love to get some high-level feedback from you about the architecture and how I'm (probably mis)using tinybase storage tools.

Design

Uses tinybase.storage.H5ioStorage to save and load nodes (individual function nodes, macros, and workflows). Is written using this branch in contrib, so it is not surprising when tests fail!

Node (and some children) and DataChannel define methods from storing their data in/restoring their data from an H5ioStorage instance, although the DataChannel

The expectation is that loading happens from an already-live instance of a Node; that means that it has already instantiated its child DataChannel objects and they are live too -- so we are really just populating attributes of live objects.

Information about channel connections is stored at the Composite level, and these are recreated after all children are available.

Macros instantiate their children at their own instantiation time, but Workflows need to look at the stored data (package identifier and node class), re-register packages, and re-instantiate their child nodes. Then these nodes can be restored from HDF as usual.

We delay interaction with pyiron_contrib until the last possible moment, as the import still takes forever.

Features

It works like this:

from pyiron_workflow import Workflow

Workflow.register("pyiron_workflow.node_library.atomistics", "atomistics")

wf = Workflow("ev_curve", overwrite_save=True)

wf.structure = wf.create.atomistics.task.Bulk("Al")
wf.calculator = wf.create.atomistics.calculator.Emt()

wf.ev = wf.create.atomistics.macro.EnergyVolumeCurve(
    structure=wf.structure, 
    calculator=wf.calculator,
)

out = wf()

wf.save()
out.ev__result_dict["bulkmodul_eq"]
>>> 39.494306059591686

Then we can come back later, in a fresh python interpreter (with the same cwd so we can find the save file), and do this:

from pyiron_workflow import Workflow

wf = Workflow("ev_curve")

wf.outputs.ev__result_dict.value["bulkmodul_eq"]
>>> 39.494306059591686

Note that at load-time we didn't need to register any packages -- everything we needed was found in the stored data and registered automatically.

Shortcomings

TODO:

Closes #153

github-actions[bot] commented 10 months ago

Binder :point_left: Launch a binder notebook on branch _pyiron/pyiron_workflow/simplestorage

liamhuber commented 10 months ago

I didn't have time today to read in depth, but will do so before our meeting tomorrow.

Super. Indeed, if the academic agenda permits, spending time talking about this stuff live would be both enjoyable and helpful!

One thing that occurs to me right now: to_storage/from_storage is very similar to the Storable interface. I guess implementing restore would be more difficult in your setup because you wanted the objects to be live before populating from storage, but it would make other things much easier (no need to keep track of class names in storage, it already saves a simple version tag). If this is a blocker I would rather discuss what needs to change in Storable to enable it to be used here.

So there are two fundamental objects we're storing: DataChannel and Node. The channels are owned by nodes, and take their parent node as an arg at instantiation time. This sort of reciprocal link is super convenient when everything is alive, and making it a mandatory arg is perfectly sensible under those conditions, but it's hell for storage. So, for DataChannel I think we're stuck defining some sort of bespoke data-extraction routine. Thankfully this is so far quite simple.

For Node I tend to agree with you. I did try sticking Storable into the MRO, but ran into two problems, both of which are probably resolvable:

  1. I just couldn't get it to do what I wanted. I really suspect this is from lack of trying/insufficiently deep understanding of Storable and surrounding classes, and probably not a fundamental problem. We're doing all sorts of nonsense with decorators and dynamically generated classes, but I'm still cautiously optimistic the only real problem was my idiocy.
  2. Importing from contrib is too damned slow, and if Node inherits from it directly then I have to import at boot time and can't delay it until I go out of my way to call .save(). This is resolvable by getting tinybase out on its own somewhere, so is only a short-term barrier. But I don't want to merge and release a solution that requires the "pyiron" boot time, as right now getting running with workflows is pretty snappy.
codacy-production[bot] commented 10 months ago

Coverage summary from Codacy

See diff coverage on Codacy

Coverage variation Diff coverage
:x: -2.76% (target: -1.00%) :white_check_mark: 35.48%
Coverage variation details | | Coverable lines | Covered lines | Coverage | | ------------- | ------------- | ------------- | ------------- | | Common ancestor commit (cc93042ccd4247504d1e2d982e20c49e53d0bd59) | 2591 | 2213 | 85.41% | | | Head commit (eda307ab9b99b2b1ac52954402fd9ef7e0345793) | 2744 (+153) | 2268 (+55) | 82.65% (**-2.76%**) | **Coverage variation** is the difference between the coverage for the head and common ancestor commits of the pull request branch: ` - `
Diff coverage details | | Coverable lines | Covered lines | Diff coverage | | ------------- | ------------- | ------------- | ------------- | | Pull request (#160) | 155 | 55 | **35.48%** | **Diff coverage** is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: `/ * 100%`

See your quality gate settings    Change summary preferences

You may notice some variations in coverage metrics with the latest Coverage engine update. For more details, visit the documentation

github-actions[bot] commented 10 months ago

Pull Request Test Coverage Report for Build 7589086727

Warning: This coverage report may be inaccurate.

We've detected an issue with your CI configuration that might affect the accuracy of this pull request's coverage report. To ensure accuracy in future PRs, please see these guidelines. A quick fix for this PR: rebase it; your next report should be accurate.


Totals Coverage Status
Change from base Build 7576987663: -1.3%
Covered Lines: 4799
Relevant Lines: 5392

💛 - Coveralls
liamhuber commented 10 months ago

Ok, so we can now get the storage file itself from the parent-most node using #165, and then the path inside the storage to a particular child using #167. That means that children look in the right place in the root-parent's folder and we can do things like:

>>> wf.ev.calc.storage["outputs/energy_dict/strict_hints"]
1

This is nice.

Tangentially, this led me to the question: @pmrv are we still struggling with restoring bools instead of 0/1, or is there just a bug somewhere? Should I still care about this?

liamhuber commented 10 months ago

Also @pmrv: with the failing tests on pyiron_contrib:tinydata and the fairly deep-ish integration (I need to pull out pyiron_base...HasGroups too), I am considering delaying all the TODOs with moving stuff from repo to repo until a later PR. That would mean sticking with the to_storage and from_storage methods for now and introducing tests at the save/load interface level. The big "pro" for me here is to get functioning storage live on main as an alpha feature. Is the palatable to you?

Long run I still am expecting to get Storable in the MRO, I just want to introduce an intermediate state to main where we have storage without that.

pmrv commented 10 months ago

I think the bool issue is related using json for dicts inside h5io, but I would have to check.

Test failure for tinybase is probably contained to the notebooks, so it should be quick on Monday, but if you want thus in main before, go ahead.

review-notebook-app[bot] commented 10 months ago

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

liamhuber commented 9 months ago

"Re-run all jobs" is not re-triggering the CI. I'm just going to try closing and reopening.

liamhuber commented 9 months ago

h5io_browser hard-pins an upper limit on the patch version of h5io, so the CI still can't run. Jan already opened version bump PR open on h5io_browser so I expect this will resolve itself quickly.

liamhuber commented 9 months ago
Could not solve for environment specs
The following package could not be installed
└─ pyiron_contrib 0.1.15**  does not exist (perhaps a typo or a missing channel).

It does though. Something on the internet must just not have updated yet. I'll try again in a few minutes.

liamhuber commented 9 months ago
Could not solve for environment specs
The following package could not be installed
└─ pyiron_contrib 0.1.15**  does not exist (perhaps a typo or a missing channel).

It does though. Something on the internet must just not have updated yet. I'll try again in a few minutes.

0.1.15 is the most recent version showing on anaconda.org, but 0.1.14 is the most recent on my local machine using conda search -c conda-forge -f pyiron_contrib, so I guess conda is just updating different parts of itself at different times and this is not github's fault at all.

liamhuber commented 9 months ago

The "rerun failed jobs" button for the CI report and "rerun job" for individual jobs are both doing nothing, so I am going to close and reopen...

liamhuber commented 9 months ago

There's some sort of pathing issue going on in the CI environment compared to my local machine; everything fails at the first node instantiation with errors of the form:

======================================================================
ERROR: test_run_data_tree (test_node.TestNode)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/share/miniconda3/envs/my-env/lib/python3.10/pathlib.py", line 1175, in mkdir
    self._accessor.mkdir(self, mode)
FileNotFoundError: [Errno 2] No such file or directory: 'start'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/runner/work/pyiron_workflow/pyiron_workflow/tests/unit/test_node.py", line 64, in setUp
    self.n1 = ANode("start", x=0)
  File "/home/runner/work/pyiron_workflow/pyiron_workflow/pyiron_workflow/snippets/has_post.py", line 16, in __call__
    post(instance, *args, **kwargs)
  File "/home/runner/work/pyiron_workflow/pyiron_workflow/pyiron_workflow/node.py", line 362, in __post__
    do_load = self.storage.has_contents
  File "/home/runner/work/pyiron_workflow/pyiron_workflow/pyiron_workflow/storage.py", line 101, in has_contents
    has_contents = self._tinybase_storage_is_there or self._h5io_storage_is_there
  File "/home/runner/work/pyiron_workflow/pyiron_workflow/pyiron_workflow/storage.py", line 150, in _tinybase_storage_is_there
    if os.path.isfile(self._tinybase_storage_file_path):
  File "/home/runner/work/pyiron_workflow/pyiron_workflow/pyiron_workflow/storage.py", line 133, in _tinybase_storage_file_path
    self.node.graph_root.working_directory.path
  File "/home/runner/work/pyiron_workflow/pyiron_workflow/pyiron_workflow/node.py", line 788, in working_directory
    self._working_directory = DirectoryObject(self.label)
  File "/home/runner/work/pyiron_workflow/pyiron_workflow/pyiron_workflow/snippets/files.py", line 41, in __init__
    self.create()
  File "/home/runner/work/pyiron_workflow/pyiron_workflow/pyiron_workflow/snippets/files.py", line 44, in create
    self.path.mkdir(parents=True, exist_ok=True)
  File "/usr/share/miniconda3/envs/my-env/lib/python3.10/pathlib.py", line 1180, in mkdir
    self.mkdir(mode, parents=False, exist_ok=exist_ok)
  File "/usr/share/miniconda3/envs/my-env/lib/python3.10/pathlib.py", line 1175, in mkdir
    self._accessor.mkdir(self, mode)
FileNotFoundError: [Errno 2] No such file or directory: 'start'
liamhuber commented 9 months ago

Ok, the tests were failing on my local machine as well. It was an issue with the NodeJob stuff interacting with the storage stuff and bad merges that I missed because the CI was out and I wasn't careful enough.

It looks like I'll also need to patch all the storage to run only on >=3.11 after all.

codacy-production[bot] commented 9 months ago

Coverage summary from Codacy

See diff coverage on Codacy

Coverage variation Diff coverage
:white_check_mark: +1.31% (target: -1.00%) :white_check_mark: 90.86%
Coverage variation details | | Coverable lines | Covered lines | Coverage | | ------------- | ------------- | ------------- | ------------- | | Common ancestor commit (344d1c44ce8517df0383ab710077bb39c123858c) | 2596 | 2218 | 85.44% | | | Head commit (2a887dc76460ab3e90f6b5917b4ac84c840119b1) | 3065 (+469) | 2659 (+441) | 86.75% (**+1.31%**) | **Coverage variation** is the difference between the coverage for the head and common ancestor commits of the pull request branch: ` - `
Diff coverage details | | Coverable lines | Covered lines | Diff coverage | | ------------- | ------------- | ------------- | ------------- | | Pull request (#160) | 514 | 467 | **90.86%** | **Diff coverage** is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: `/ * 100%`

See your quality gate settings    Change summary preferences

You may notice some variations in coverage metrics with the latest Coverage engine update. For more details, visit the documentation