m3dev / gokart

Gokart solves reproducibility, task dependencies, constraints of good code, and ease of use for Machine Learning Pipeline.
https://gokart.readthedocs.io/en/latest/
MIT License
305 stars 57 forks source link

[Feature Request] Don't save Task #258

Open vaaaaanquish opened 2 years ago

vaaaaanquish commented 2 years ago

I'll create a task like Function that won't be saved.

for example

import gokart

class Pipeline(gokart.TaskOnKart):
    def requires(self):
        data = LoadData()
        features = [MakeFeatureA(data=data), MakeFeatureB(data=data), MakeFeatureC(data=data)]

        # `Flatten` is a Task, but we don't want to dump result because the data will be too large :(
        feature = Flatten(features=features, axis=1)

        model = TrainModel(feature=feature)
        return model
vaaaaanquish commented 2 years ago

I'm thinking about making gokart.Function

import pandas
import gokart

class FlattenFunction(gokart.Function):
    def process(self):
        df_list = self.load()
        df = pd.concat(df_list, axis=1)
        return df

Function's result will not be dumped to TASK_WORKSPACE, but will be temporarily stored in a tmp file. In the second runs, There is no file, but it will be skipped judgment for whether the task has been executed.

vaaaaanquish commented 2 years ago

This is still just idea. Plz comment :)