Feature Request: Tasks hash ids update when run function definition changes

m3dev / gokart

Gokart solves reproducibility, task dependencies, constraints of good code, and ease of use for Machine Learning Pipeline.

https://gokart.readthedocs.io/en/latest/

MIT License

305 stars 57 forks source link

Feature Request: Tasks hash ids update when run function definition changes #216

Closed TaylerUva closed 2 years ago

TaylerUva commented 3 years ago

When the code signature of the run function changes in a task, this should update the hash value since the hash is different than its last run.

class TaskA(gokart.TaskOnKart):

    param = luigi.IntParameter()

    def run():
        sum = self. param + 3
        self.dump(sum)

class TaskA(gokart.TaskOnKart):

    param = luigi.IntParameter()

    def run():
        sum = self. param + 2
        self.dump(sum)

These should have different hash ids because the internal logic has changed.

vaaaaanquish commented 3 years ago

It's difficult issue to solve because of It has side effects.

With the following code, the user will have to think about whether the hash value will change when the version of numpy changes.

import numpy

class TaskA(gokart.TaskOnKart):

    param = luigi.IntParameter()

    def run():
        self.dump(np.ndarray([1]))

We also have to choose how much of the internal changes of class and decorator you want to include in the hash. This is because gokart is used in a variety of settings, including experiments, competitions, productions and more.

How far do you think we should look?

Hi-king commented 2 years ago

@TaylerUva

From gokart 1.0.6, serialized_task_definition_check has been introduced by @ujiuji1259 As shown as follows, I believe this issue has been solved :)

class TaskA(gokart.TaskOnKart):
    param = luigi.IntParameter()
    serialized_task_definition_check = True
    def run():
        sum = self. param + 3
        self.dump(sum)

print(TaskA(param=1).make_unique_id())

class TaskA(gokart.TaskOnKart):
    param = luigi.IntParameter()
    serialized_task_definition_check = True

    def run():
        sum = self. param + 2
        self.dump(sum)
print(TaskA(param=1).make_unique_id())

9adc132c5b79f126c877aa25f2c02502
3372e1a315893c54f47da707de7031bb