m3dev / gokart

Gokart solves reproducibility, task dependencies, constraints of good code, and ease of use for Machine Learning Pipeline.
https://gokart.readthedocs.io/en/latest/
MIT License
305 stars 57 forks source link

Feature Request: gokart.build dump to tmpfile #244

Open vaaaaanquish opened 3 years ago

vaaaaanquish commented 3 years ago

I would like to see a function that can be used for tests where you don't want to leave a pkl.

now

df = gokart.build(Foo())    # dump pkl ./resource/...

feature image

df = gokart.build(Foo(), not_dump=True)    # to tmpfile and rm

However, whether this should be implemented in build is a matter of debate.

mski-iksm commented 3 years ago

@vaaaaanquish I agree implementing this feature.

However, I thought it would be nice to have this option in TaskOnKart, since there are cases where you don't want to keep a dump file. (e.g. Task is very simple and takes short time to run, but the dump data size is very large that it pressurize the storage cost.)

vaaaaanquish commented 3 years ago

@mski-iksm All right.

I think when implemented in TaskOnKart, going to be complicated to deal with intermediate dependent tasks.. 🤔 My idea is dump to tmpfile. This is probably too easy.

Do you have any other good ideas?

Hi-king commented 3 years ago

I think we only need to inject workspace_directory of final task. As following

class TaskA(gokart.TaskOnKart):
    def run(self):
        print(self.output().path())
        self.dump("")

class TaskB(TaskA):
    def requires(self):
        return TaskA()

with tempfile.TemporaryDirectory() as d:
    print(d)
    gokart.build(TaskB(workspace_directory=d), verbose=True)
hirosassa commented 3 years ago

agree with @Hi-king 's idea