m3dev / gokart

Gokart solves reproducibility, task dependencies, constraints of good code, and ease of use for Machine Learning Pipeline.
https://gokart.readthedocs.io/en/latest/
MIT License
305 stars 57 forks source link

serialized_task_definition_check generates different hashes for same class #207

Closed hiro-o918 closed 3 years ago

hiro-o918 commented 3 years ago

I found serialized_task_definition_check (introduced by #205) option creates different hashes if a task depends on imported object.

To reproduce the bug, run this scripts many times.

import gokart

from pkg import func

class Example(gokart.TaskOnKart):
    def run(self):
        print(func())
        self.dump('Hello, world!')

gokart.build(Example(serialized_task_definition_check=True)) 

I really apologize for the PR (#205), and if you do not mind, please revert the PR or drop the tag.

Hashing tasks (or any classes) is a harder problem than I thought, because classes depends on some packages and the packages also. So your workaround with version tags is very reasonable for the issue I mentioned in the PR

I think inspect.getsource can be a solution for this since we do not need serialized task but just hash values. However, I do not have a good idea to retrieve codes recursively to reflect all the dependencies.

Anyway, the current implementation is invalid and I am sorry again for the inconvenience

Hi-king commented 3 years ago
$python -c “import time;print(hash(time.sleep))”
6917529027924698041

$python -c “import time;print(hash(time.sleep))”
6917529027918942137

Wow. I didn't know this behaviour :0 We're sorry for overlooking for this problem.

And thanks for your trying for introducing the new feature to gokart, again!

hirosassa commented 3 years ago

Thank you for the report! This behavior is very surprising to me, too.

We'll revert the PR and yank PyPI release.