zincware / ZnTrack

Create, visualize, run & benchmark DVC pipelines in Python & Jupyter notebooks.
https://zntrack.readthedocs.io
Apache License 2.0
48 stars 5 forks source link

Spawn and Pool Nodes #128

Open PythonFZ opened 3 years ago

PythonFZ commented 3 years ago

For subgraphs as shown in the Following image I suggest an API like this:

@Node()
class SpawnNode:
    iterator = zn.iterable()
    result = zn.outs()

    def __call__(self, iterator):
        self.iterator = iterator

    def run(self):
        # assume that the respective self.iterator in the run in this case is just an integer an not the list
        self.result = self.iterator ** 2

SpawnNode(spawn=True)(iterator=[1., 2., 3., 4.])
# or more simple, because the zn.iterable indicades a spawn node automatically
SpawnNode()(iterator=[1., 2., 3., 4.])

this will create the following Nodes:

SpawnNode(name="SpawnNode_iterator_1")
SpawnNode(name="SpawnNode_iterator_2")
SpawnNode(name="SpawnNode_iterator_3")
SpawnNode(name="SpawnNode_iterator_4")

The iterator can be anything from tuples to generators. Also if you have two zn.iterable it would iterate over all possible combinations.

One major downside of this method is that it can probably not run them in parallel! So we should still consider having a sub-git or similar to run them in parallel on e.g. a cluster.

A pool node on the other hand will be a classic Node that has a SpawnNode as a dependency

@Node()
class PoolNode:
    spawn_nodes: list[SpawnNode] = dvc.deps(SpawnNode(load=True))
    result = zn.outs()

    def run(self):
        self.result = 0
        for node in self.spawn_nodes:
            if node.result > self.result:
                self.result = node.result
SamTov commented 3 years ago

I think it looks great. Is it possible to put a spawn=True into the decorator? That way it defines the type of the node in the node declaration. It could also maybe check that the iterator exists or something I am not sure.

PythonFZ commented 3 years ago

I think it looks great. Is it possible to put a spawn=True into the decorator? That way it defines the type of the node in the node declaration. It could also maybe check that the iterator exists or something I am not sure.

Yes, it would be absolutely possible. It would be more of a check, because the definition itself would come from the zn.iterable definition. Therefore, it would be a little bit redundant, but I agree that if it is desired the decorator would be the better place compared to the __init__