m3dev / gokart

Gokart solves reproducibility, task dependencies, constraints of good code, and ease of use for Machine Learning Pipeline.
https://gokart.readthedocs.io/en/latest/
MIT License
318 stars 57 forks source link

compress info tree #192

Closed vaaaaanquish closed 3 years ago

vaaaaanquish commented 3 years ago

fix: #89

For example:

import luigi
import gokart

class TaskA(gokart.TaskOnKart):
    param = luigi.Parameter()
    def run(self):
        self.dump(f'{self.param}')

class TaskB(gokart.TaskOnKart):
    task = gokart.TaskInstanceParameter()
    def run(self):
        task = self.load('task')
        self.dump(task + ' taskB')

class TaskC(gokart.TaskOnKart):
    task = gokart.TaskInstanceParameter()
    def run(self):
        task = self.load('task')
        self.dump(task + ' taskC')

class TaskD(gokart.TaskOnKart):
    task1 = gokart.TaskInstanceParameter()
    task2 = gokart.TaskInstanceParameter()
    def run(self):
        task = [self.load('task1'), self.load('task2')]
        self.dump(','.join(task))

The more dependencies you have, the harder it is to see the tree.

task = TaskD(
    task1=TaskD(
        task1=TaskD(task1=TaskC(task=TaskA(param='foo')), task2=TaskC(task=TaskB(task=TaskA(param='bar')))),  # same task
        task2=TaskD(task1=TaskC(task=TaskA(param='foo')), task2=TaskC(task=TaskB(task=TaskA(param='bar'))))   # same task
    ),
    task2=TaskD(
        task1=TaskD(task1=TaskC(task=TaskA(param='foo')), task2=TaskC(task=TaskB(task=TaskA(param='bar')))),  # same task
        task2=TaskD(task1=TaskC(task=TaskA(param='foo')), task2=TaskC(task=TaskB(task=TaskA(param='bar'))))   # same task
    )
)
print(gokart.make_tree_info(task))
└─-(PENDING) TaskD[187ff82158671283e127e2e1f7c9c095]
   |--(PENDING) TaskD[ca9e943ce049e992b371898c0578784e]
   |  |--(PENDING) TaskD[1cc9f9fc54a56614f3adef74398684f4]
   |  |  |--(PENDING) TaskC[dce3d8e7acaf1bb9731fb4f2ae94e473]
   |  |  |  └─-(PENDING) TaskA[be65508b556dd3752359b4246791413d]
   |  |  └─-(PENDING) TaskC[de39593d31490aba3cdca3c650432504]
   |  |     └─-(PENDING) TaskB[bc2f7d6cdd6521cc116c35f0f144eed3]
   |  |        └─-(PENDING) TaskA[5a824f7d232eb69d46f0ac6bbd93b565]
   |  └─-(PENDING) TaskD[1cc9f9fc54a56614f3adef74398684f4]
   |     |--(PENDING) TaskC[dce3d8e7acaf1bb9731fb4f2ae94e473]
   |     |  └─-(PENDING) TaskA[be65508b556dd3752359b4246791413d]
   |     └─-(PENDING) TaskC[de39593d31490aba3cdca3c650432504]
   |        └─-(PENDING) TaskB[bc2f7d6cdd6521cc116c35f0f144eed3]
   |           └─-(PENDING) TaskA[5a824f7d232eb69d46f0ac6bbd93b565]
   └─-(PENDING) TaskD[ca9e943ce049e992b371898c0578784e]
      |--(PENDING) TaskD[1cc9f9fc54a56614f3adef74398684f4]
      |  |--(PENDING) TaskC[dce3d8e7acaf1bb9731fb4f2ae94e473]
      |  |  └─-(PENDING) TaskA[be65508b556dd3752359b4246791413d]
      |  └─-(PENDING) TaskC[de39593d31490aba3cdca3c650432504]
      |     └─-(PENDING) TaskB[bc2f7d6cdd6521cc116c35f0f144eed3]
      |        └─-(PENDING) TaskA[5a824f7d232eb69d46f0ac6bbd93b565]
      └─-(PENDING) TaskD[1cc9f9fc54a56614f3adef74398684f4]
         |--(PENDING) TaskC[dce3d8e7acaf1bb9731fb4f2ae94e473]
         |  └─-(PENDING) TaskA[be65508b556dd3752359b4246791413d]
         └─-(PENDING) TaskC[de39593d31490aba3cdca3c650432504]
            └─-(PENDING) TaskB[bc2f7d6cdd6521cc116c35f0f144eed3]
               └─-(PENDING) TaskA[5a824f7d232eb69d46f0ac6bbd93b565]

This change will result in the following:

└─-(PENDING) TaskD[187ff82158671283e127e2e1f7c9c095]
   |--(PENDING) TaskD[ca9e943ce049e992b371898c0578784e]
   |  |--(PENDING) TaskD[1cc9f9fc54a56614f3adef74398684f4]
   |  |  |--(PENDING) TaskC[dce3d8e7acaf1bb9731fb4f2ae94e473]
   |  |  |  └─-(PENDING) TaskA[be65508b556dd3752359b4246791413d]
   |  |  └─-(PENDING) TaskC[de39593d31490aba3cdca3c650432504]
   |  |     └─-(PENDING) TaskB[bc2f7d6cdd6521cc116c35f0f144eed3]
   |  |        └─-(PENDING) TaskA[5a824f7d232eb69d46f0ac6bbd93b565]
   |  └─-(PENDING) TaskD[1cc9f9fc54a56614f3adef74398684f4]
   |     └─- ...
   └─-(PENDING) TaskD[ca9e943ce049e992b371898c0578784e]
      └─- ...

We can disable it by doing the following:

print(make_tree_info(task, compress=False))
vaaaaanquish commented 3 years ago

TODO

vaaaaanquish commented 3 years ago

@hirosassa @Hi-king @mski-iksm plz review !

Hi-king commented 3 years ago

@vaaaaanquish Thx for your really quick improvements! Merged :)

hirosassa commented 3 years ago

@vaaaaanquish Thanks again!