Open AdrianSosic opened 4 years ago
Hi Adrian, thanks for the issue and glad xyzpy
is being useful! It should be straightforward and seems useful to add a picklelib
arg or something to Crop
. I think the only functions called are dumps
and loads
.
Just as a quick first check you could try switching this line at the top of batch.py
:
from joblib.externals import cloudpickle
# to -->
import dill as cloudpickle
and see if everything runs for you?
Hi @jcmgray, thanks for getting in touch. Unfortunately, your suggested change did not resolve the issue but raise the following error:
File "/Users/M280152/Downloads/xyzpy/xyzpy/gen/farming.py", line 631, in Crop
num_batches=num_batches)
File "/Users/M280152/Downloads/xyzpy/xyzpy/gen/batch.py", line 226, in __init__
self._sync_info_from_disk()
File "/Users/M280152/Downloads/xyzpy/xyzpy/gen/batch.py", line 333, in _sync_info_from_disk
farmer = None if farmer_pkl is None else pickle.loads(farmer_pkl)
ModuleNotFoundError: No module named '__builtin__'
Any thoughts on this?
OK that seems to be a separate problem - the farmer_pkl
currently is pickled and unpickled by different libraries, which I am surprised currently works. That can be easily fixed.
The main problem is in fact not to do with pickling the function (what cloudpickle
is currently used for), but using joblib.dump
to write the result inside the grow
function. Since I had assumed this to always be numeric types and arrays etc.
As an easier workaround than your current, you could simply pickle the return yourself:
return dill.dumps(botorch.models.SingleTaskGP(x, y))
then unpickle on the other end.
And it might be nice to have this as a separate picklelib options as well.
Hi @jcmgray, I see. Is there a particular reason why you are using both cloudpickle
and joblib
instead of only one of them, i.e. would it be possible to also use dill (e.g. via setting an option) for the grow
function?
In any case, am using your suggested solution at them moment as a workaround, which is indeed much smarter than simply throwing away the objects ;-)
Thanks a lot for your help! Much appreciated!
The reasoning was I think as follows:
This logic might not be necessary anymore, & I defo agree it would nice to be able to be able to customize which picklers are used.
I can try and add this at some point (unless you want to!), but it might not be immediately.
Hi @jcmgray, I'm currently using your awesome package to automate my experiments and noticed a problem related to pickling certain data types. While the cloudpickle backend of joblib should work fine to handle, for example, lambda functions, I get an error when working with certain modules based on torch.
Here is a minimal example:
It produces the following error:
Tested with
Python 3.7.3
andAfter a short search, I found this related post: https://github.com/cornellius-gp/gpytorch/issues/907 A potential solution seems to be using
dill
instead ofpickle
. Do you think this option can be added to xyzpy?For now, my workaround is to remove all problematic variables from the object returned by function to be evaluated after all internal computations have been completed. However, it would be much nicer, of course, if the objects could be naturally handled by xyzpy.
Kind regards, Adrian