pyiron / pyiron_workflow

Graph-and-node based workflows
BSD 3-Clause "New" or "Revised" License
12 stars 1 forks source link

Semantic storage path #187

Closed liamhuber closed 9 months ago

liamhuber commented 9 months ago

Abstractly, we can think about the "semantic path" of a node as the chain of node labels from the graph root down to that node, e.g. "some_workflow/some_macro/some_sub_macro/a_macro_child_node".

Composite nodes hold their children in a (dot)dictionary .nodes, but use __getattr__ so this semantic path can be used with live objects, e.g. some_workflow.some_macro.some_sub_macro.a_macro_child_node.

Naively storing a Composite's state, the path in the state winds up looking instead like "some_workflow/nodes/some_macro/nodes/some_sub_macro/nodes/a_macro_child_node".

Here, we modify __getstate__ and __setstate__ to de(re)constuct the nodes dictionary so the state path matches the semantic path.

This is safe since children of Composite are already forbidden from having a label that conflicts with any of its parent's attributes, so the space in the state dictionary can't possibly conflict with anything.

Since we use __get/setstate__ for file storage with HDF5, this means the storage path will also match the semantic path (or will match it after downstream PRs in the case of the "tinybase" storage mode, which isn't yet using the state).

review-notebook-app[bot] commented 9 months ago

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

github-actions[bot] commented 9 months ago

Binder :point_left: Launch a binder notebook on branch _pyiron/pyiron_workflow/semantic_storagepath

samwaseda commented 9 months ago

Sorry I didn't manage to do it today (same applies to other PR that I couldn't review). I'll try to do it tomorrow

srmnitc commented 9 months ago

I will finish this tomorrow morning, should be ready before your work day starts!

liamhuber commented 9 months ago

I will finish this tomorrow morning, should be ready before your work day starts!

@srmnitc no worries. Actually, reviewing the code itself is the least of my reason's for @'ing you.

What I'm moving towards here is a principle that for graph based workflows the semantic path (i.e. chain of node names), filesystem path (i.e. working directories), and storage path (in __getstate__, inside HDF5, etc.). I was curious if this already aligns with your thinking or if you had feedback.

On the one hand it feels like a trivial idea, but actually at least in the case of __getstate__ it really requires intentionality or you get extra container terms wedged in between nodes.