zincware / ZnTrack

Create, visualize, run & benchmark DVC pipelines in Python & Jupyter notebooks.
https://zntrack.readthedocs.io
Apache License 2.0
48 stars 5 forks source link

Node types #113

Open SamTov opened 3 years ago

SamTov commented 3 years ago

It could be interesting, particularly in the case of SLURM submissions, to have different types of nodes. One that comes to mind immediately would be a so-called recursive node that can spam process with varying parameters for you and deploy them on a cluster. This could be a very efficient way to run one process several times with changing parameters under a single class. For example, an active learner which need to run one process with several parameters but could be stored under a single active learner class.

PythonFZ commented 3 years ago

A single node that spawns multiple Nodes is highly requested and strongly related to #112 This would require a Node on a graph to build a subgraph which afaik is not easily possible with DVC. It would possibly require multiple repositories that again build a DAG. This would require some good planing on what would be the best way to build this but is highly interesting!

SamTov commented 3 years ago

Addition

This is actually an interesting step in the direction of including loops in a computational graph. Whilst the full graph should remain acyclic, it could be possible to have specific nodes in loops to solve some problem.

What we would then have is a dynamic rather than a static graph.

Sorry, this was supposed to be on this issue

SamTov commented 3 years ago

The issue certainly is that it is in the assumption of DAG. In the DVC framework it seems that directed and acyclic must apply globally and locally. This would require breaking the requirement on local cycles. Either it gets handled separately from DVC (probably not the best approach) or a sub repository is built for this loop. While this seems excessive I actually think it could have some nice features. For example, solely studying the active learning cycle within its own repo would be very insightful.

SamTov commented 3 years ago

Of course with nested repos the graph drawing and plotting would come down to ZnTrack and a set of custom tools needs tone built.

PythonFZ commented 2 years ago

Have an DataClass Node, that does not have a run method but can be used via Node(load=True).<...> to gain access to user parameters inside the graph e.g. running in a temporary directory