joblib / joblib

Computing with Python functions.
http://joblib.readthedocs.org
BSD 3-Clause "New" or "Revised" License
3.76k stars 413 forks source link

Memory: output pickle file stored with wrong name, cache miss every time #1046

Open antoine-gallix opened 4 years ago

antoine-gallix commented 4 years ago

My cache setup recently stopped to work. My cached function is called every single time. I've explored many possibilities of mistake on my side, but I finally dived with the debugger inside joblib memory module. It appears that the module decides to call the cached function because it doesn't find the output.pkl file that would contain the cached data. If I go in the cache directory, the directory that corresponds to my function exists, the subdirectory for the argument id as well. And inside I find the following:

metadata.json
output.pkl.thread-140339490531728-pid-11915
output.pkl.thread-139846468229520-pid-10613 
output.pkl.thread-140659720259856-pid-10444
output.pkl.thread-139952457197840-pid-10286

I've called the cached function without success 4 times since last cache flush. It seems that at the moment of saving the data after a function call, the filename has this strange suffix made of thread and pid appended to it, preventing it to be found by subsequent calls.

My code doesn't make explicit use of thread. The cached function does some query to a remote server, and the only input is the query as a string. Nothing too exotic.

Note also, that trying to cache a dummy one-liner function do work correctly. So something in my function happens to break the code.

my version is 0.14.1

antoine-gallix commented 4 years ago

Following the code path even deeper, I finally found that a pickling error happened and got silenced. The error comes from the output of my function that is not picklable. So the error was mine. Nevertheless, it cost me a lot of effort to find it out. Is it a desirable behaviour to have such errors silenced?

GaelVaroquaux commented 4 years ago

Is it a desirable behaviour to have such errors silenced?

The goal of joblib is to avoid crashing: it should be safe to sprinkle code with joblib.

However, a warning would be useful. I think that a pull request would be gladly accepted.

Thanks!

antoine-gallix commented 4 years ago

The goal of joblib is to avoid crashing: it should be safe to sprinkle code with joblib.

That make sense.

Would catching PicklingError at dump time and sending a warning with the standard python warning module do the job? Or just sending a message via print like other messages from Memory? If so at which verbosity level?

GaelVaroquaux commented 4 years ago

Would catching PicklingError at dump time and sending a warning with the standard python warning module do the job?

Sounds good to me. With a well chosen warning.