pyiron / pyiron_workflow

Graph-and-node based workflows
BSD 3-Clause "New" or "Revised" License
12 stars 1 forks source link

:bulb: What is a "project" and how do we use them? #163

Open liamhuber opened 10 months ago

liamhuber commented 10 months ago

@pmrv, @JNmpi, and I discussed a bit the similarities and differences between a "project" (e.g. tinybase ProjectAdapter), and Workflow.

Workflow is a dynamic and flexible object used when you're developing your workflow (once it's crystallized you can turn it into a Macro). Workflow is also a parent-most object in the graph context, i.e. it is not intrinsically aware of any other graphs. In their current implementation the semantic path of a workflow always just starts with the workflow label, and then there is a perfect 1:1 correspondence of filesystem directories and the semantic path.

@pmrv pointed out that there are times when you may be jointly developing two or more different "workflows", which are related by some data connection, but where you don't necessarily want to always be re-running the upstream part of the process while you're modifying and playing with some downstream component. One day you might jam them all into a single big workflow that runs top to bottom, but in the moment it can be helpful to keep different development chunks separated.

A "project" may then bring:

In our conversation, the question was whether this fundamentally required a separate Project class, or if extension of the existing Workflow behaviour would be sufficient. We came to the tentative conclusion that Workflow could simply be more empowered. E.g. this pseudocode:

pr = Project(
    semantic_path="test/subdir", 
    storage_root="/usr/some/other/place"
    storage_type="hdf", 
)
wf = pr.Workflow("foo")

Could be equivalent to this:

wf = Workflow(
    "test/subdir/foo", 
    storage_root="/usr/some/other/place", 
    storage_path="foo",
    storage_type="hdf"
)

In both cases the resulting workflow has the same semantic path (wf.semantic_root / "test/subdir/foo") and storage location that differs from it ("/usr/some/other/place/foo") and storage back-end (hdf). In the former case, because the full semantic path was given to the project, the wf.sematic_root would just be nothing. More generally, one can imagine in the latter case that wf.semantic_path == wf.semantic_root / wf.label and wf.storage_location == wf.storage_root / wf.label, where the default for both the semantic_root and storage_root is just cwd(), but could otherwise be provided at instantiation or set in a config file.

This is all just some pseudocode, but it shows there is no obvious reason an extra Project class is needed -- the separation of semantic and filesystem paths can be handled right inside Workflow.

Similarly, if we have a database interface (singleton?), this can be slapped onto Workflow and given useful shortcuts just the way Creator is (Workflow.create::Creator(), Workflow.register::Creator().register, ...).

So the tentative plan is to bring tinybase here from contrib (#161), and then slowly start merging in the project and/or job capabilities we need from there into Workflow and/or Node.