m3dev / gokart

Gokart solves reproducibility, task dependencies, constraints of good code, and ease of use for Machine Learning Pipeline.
https://gokart.readthedocs.io/en/latest/
MIT License
305 stars 57 forks source link

Question: How to run tasks with dependencies without parameter? #253

Closed kgkgzrtk closed 2 years ago

kgkgzrtk commented 2 years ago

I cannot execute tasks with dependencies as the following example.

import gokart
#import luigi

class TaskA(gokart.TaskOnKart):
    #param = luigi.Parameter()
    def run(self):
        self.dump("Hello")

class TaskB(gokart.TaskOnKart):
    def requires(self):
        #return TaskA(param="called by TaskB")
        return TaskA()
    def run(self):
        res = self.load()
        self.dump(res + "World!")

print(gokart.build(TaskB()))
>>> World

If I write the parameters show in the comments in the code, TaskA will work.

>>> HelloWorld!

Is this behavior a specification?

e-mon commented 2 years ago

In my environment, your example (with comment out) is working as "HelloWorld!"

$ pip freeze | grep -e luigi -e gokart
gokart==1.0.5
luigi==3.0.3

Is it possible to show a reproducible environment (like Dockerfile) ?

kgkgzrtk commented 2 years ago

@e-mon I don't have a Dockerfile because I tried it in my local environment that is on the xonsh shell(bellow version). But I can show you some environments below.

$ python -V
Python 3.9.6

$ pip freeze | grep -e luigi -e gokart -e xonsh
gokart==1.0.5
luigi==3.0.3
xonsh==0.9.27

$ sw_vers
ProductName:    macOS
ProductVersion: 11.2.2
BuildVersion:   20D80

Is there any other information you need?

mski-iksm commented 2 years ago

@kgkgzrtk If you have ran a task which is named TaskA but has different dump code like following previously, the old cache will be used and result like above.

class TaskA(gokart.TaskOnKart):
    def run(self):
        self.dump("")

This is because same cache file will be used even when the code is changed. Gokart only pays attention to the parameters when setting cache name.

There are 3 options to resolve this.

  1. Delete old cache of TaskA.
  2. Add dummy parameter to TaskA to change the unique_id. example: __versions: int = luigi.IntParameter(default=1)
  3. Use gokart>=1.0.6 and set serialized_task_definition_check=True. This will make gokart be conscious of code change.
kgkgzrtk commented 2 years ago

@e-mon @mski-iksm It's work with gokart>=1.0.6 and set serialized_task_definition_check=True. Thank you :)

Hi-king commented 2 years ago

🆒 Thx for reporting!