Open eladsegal opened 2 years ago
Hi ! Thanks for reporting. Indeed this is a current limitation of the usage we have of dill
in datasets
. I'd suggest you use your workaround for now until we find a way to fix this. Maybe functions that are not coming from a module not installed with pip should be dumped completely, rather than only taking their locations into account
I agree. Sounds like a solution for it would be pretty dirty, even cloudpickle doesn't help in this case. In the meanwhile I think that adding a warning and the workaround somewhere in the documentation can be helpful.
For anyone interested, I see that with dill==0.3.6
the workaround I suggested doesn't work anymore.
I opened an issue about it: https://github.com/uqfoundation/dill/issues/572.
Describe the bug
When
.map
is used with a mapping function that is imported, the cache is reused even if the mapping function has been modified. The reason for this is thatdill
that is used for creating the fingerprint pickles imported functions by reference.I guess it is not a widespread case, but it can still lead to unwanted results unnoticeably.
Steps to reproduce the bug
Create files
a.py
andb.py
:Run
python b.py
twice: In the first run you will see tqdm bars showing that the data is processed, and in the second run you will see "Loading cached processed dataset at...". Now changeID_LENGTH
to another number in order to change the mapping function, and runpython b.py
again. You'll see that.map
loads from the cache the result of the previous mapping function.Expected results
Run
python a.py
twice: In the first run you will see tqdm bars showing that the data is processed, and in the second run you will see "Loading cached processed dataset at...". Now changeID_LENGTH
to another number in order to change the mapping function, and runpython a.py
again. You'll see that the dataset is being processed and that there's no reuse of the previous mapping function result.Workaround
Put the mapping function inside a dummy class as a static method:
Environment info
datasets
version: 1.15.1