uqfoundation / dill

serialize all of Python
http://dill.rtfd.io
Other
2.28k stars 181 forks source link

NumPy vectorized function can not be dilled after using them #487

Open arielshulman opened 2 years ago

arielshulman commented 2 years ago

I had encounter with a strange behavior when I used scikit-learn pipelines with np.vectorize function and pickle it using dill.

I've managed to narrow the situation to this - When I try to pickle a simple function that have been vectorized with non-regular otype such object or str, if it occurs before running it, dill works fine but if I've used it once, the dill yells with PicklingError.

For example -

import numpy as np
import dill

def f(x):
    return x
vf = np.vectorize(f,otypes=[object])

arr = np.asarray(["a","b","c"])

dill.detect.trace(True)

dill.dumps(vf)

out = vf(arr)
print(out)

dill.dumps(vf)

and the output is -

T4: <class 'numpy.vectorize'>
# T4
D2: <dict object at 0x000001BB56AF1F00>
F1: <function f at 0x000001BB56E5A430>
F2: <function _create_function at 0x000001BB67356430>
# F2
Co: <code object f at 0x000001BB56B659D0, file "C:\Users\USER\dill_vect.py", line 4>
F2: <function _create_code at 0x000001BB673564C0>
# F2
# Co
D1: <dict object at 0x000001BB56AF1CC0>
# D1
D2: <dict object at 0x000001BB67255E80>
# D2
# F1
D2: <dict object at 0x000001BB6711CB80>
# D2
# D2
['a' 'b' 'c']
T4: <class 'numpy.vectorize'>
# T4
D2: <dict object at 0x000001BB56AF1F00>
F1: <function f at 0x000001BB56E5A430>
F2: <function _create_function at 0x000001BB67356430>
# F2
Co: <code object f at 0x000001BB56B659D0, file "C:\Users\USER\dill_vect.py", line 4>
F2: <function _create_code at 0x000001BB673564C0>
# F2
# Co
D1: <dict object at 0x000001BB56AF1CC0>
# D1
D2: <dict object at 0x000001BB67255E80>
# D2
# F1
D2: <dict object at 0x000001BB6711CB80>
Traceback (most recent call last):
  File "C:\Users\USER\dill_vect.py", line 17, in <module>
    dill.dumps(vf)
  File "C:\ProgramData\Anaconda3\lib\site-packages\dill\_dill.py", line 304, in dumps
    dump(obj, file, protocol, byref, fmode, recurse, **kwds)#, strictio)
  File "C:\ProgramData\Anaconda3\lib\site-packages\dill\_dill.py", line 276, in dump
    Pickler(file, protocol, **_kwds).dump(obj)
  File "C:\ProgramData\Anaconda3\lib\site-packages\dill\_dill.py", line 498, in dump
    StockPickler.dump(self, obj)
  File "C:\ProgramData\Anaconda3\lib\pickle.py", line 487, in dump
    self.save(obj)
  File "C:\ProgramData\Anaconda3\lib\pickle.py", line 603, in save
    self.save_reduce(obj=obj, *rv)
  File "C:\ProgramData\Anaconda3\lib\pickle.py", line 717, in save_reduce
    save(state)
  File "C:\ProgramData\Anaconda3\lib\pickle.py", line 560, in save
    f(self, obj)  # Call unbound method with explicit self
  File "C:\ProgramData\Anaconda3\lib\site-packages\dill\_dill.py", line 990, in save_module_dict
    StockPickler.save_dict(pickler, obj)
  File "C:\ProgramData\Anaconda3\lib\pickle.py", line 971, in save_dict
    self._batch_setitems(obj.items())
  File "C:\ProgramData\Anaconda3\lib\pickle.py", line 997, in _batch_setitems
    save(v)
  File "C:\ProgramData\Anaconda3\lib\pickle.py", line 560, in save
    f(self, obj)  # Call unbound method with explicit self
  File "C:\ProgramData\Anaconda3\lib\site-packages\dill\_dill.py", line 990, in save_module_dict
    StockPickler.save_dict(pickler, obj)
  File "C:\ProgramData\Anaconda3\lib\pickle.py", line 971, in save_dict
    self._batch_setitems(obj.items())
  File "C:\ProgramData\Anaconda3\lib\pickle.py", line 1002, in _batch_setitems
    save(v)
  File "C:\ProgramData\Anaconda3\lib\pickle.py", line 589, in save
    self.save_global(obj, rv)
  File "C:\ProgramData\Anaconda3\lib\pickle.py", line 1070, in save_global
    raise PicklingError(
_pickle.PicklingError: Can't pickle <ufunc 'f (vectorized)'>: it's not found as __main__.f (vectorized)

This test ran on Windows, but I've tested it on Linux as well and the same problem occurs. Packages versions used for the test - numpy==1.22.4 and dill==0.3.5.1 and also with dill==0.3.4

anivegesana commented 2 years ago

Caused by this: https://github.com/uqfoundation/dill/pull/443#issuecomment-1003214107

Removing said lines from dill will fix the issue. I need to find an upper bound on the numpy version that didn't support ufunc pickling, but this is an easy issue to fix.

The lower bound of the search is at numpy 1.15.0 because that is the oldest version of numpy that will work with the latest version of dill.

arielshulman commented 2 years ago

Interesting, thanks for the super quick reply 🙏

I've tried your suggestion locally with a clean virtual environment, python 3.8.13, numpy 1.22.4 and dill as editable mode upon clone from master branch (with my windows machine 🙄), I've deleted the rows you've pointed, but unfortunately the error stays.

I'm not sure if that helps, but I've also tried numpy==1.18 and numpy==1.19 with dill==0.3.5.1 and there everything works great. Since numpy==1.20.0 (didn't tried patch versions) the error start occuring.

anivegesana commented 2 years ago

As far as I can tell, dill has legacy support for pickling numpy arrays because once upon a time, numpy didn't support pickling. Modern numpy versions picked up the feature on their end, so dill doesn't need to support it anymore. Since this doesn't work when you change dill to pickle and produces the same message, I would file the issue under numpy/numpy.

Correct me if I am wrong @mmckerns.

mmckerns commented 2 years ago

In terms of the history, basically, yes. numpy is an important enough case that pickling of its core objects should be supported. I'm assuming the issue is that function internal objects that are created on the first call (i.e. during lazy evaluation) became non-serializable due to a change in numpy. I don't know if numpy promises to have serialized ufuncs, or functions created by vectorize -- if the answer is yes, then they will fix it in numpy. If not, then we will need to see what is possible in dill. I'd go ahead and open a ticket with numpy, and reference this one. Note that dill just passes the ufunc to pickle, so that tells me that numpy intends to have all ufunc be serializable.

arielshulman commented 2 years ago

I understand that this issue is not really about dill, but since it's still open I'd like to share what I've found till now. I've done a little research trying to find the origin of this problem.

  1. as @mmckerns said, the problem of np.vectorize which can't be dilled after the first use caused by a lazy evaluation, which leads back to np.frompyfunc which create a ufunc.
    
    def f(x):
    return x
    uf = np.frompyfunc(f,1,1)
    dill.detect.baditems(uf)

[<ufunc 'f (vectorized)'>]

2. So I've tried to understand what has changed at `numpy==1.20.0` which made pickling stop working, and that lead me [here](https://github.com/numpy/numpy/pull/17289).
It seems that the way `ufunc` are pickled has changed.
As I said earlier, it seemed that pickling works fine prior `1.20.0`, but it only seems like it. Apparently if I pickled the object, and then tried to unpickle it on a fresh kernel it not really worked, and this what I've got when unpickling -

PicklingError: Can't pickle <ufunc '? (vectorized)'>: it's not found as main.? (vectorized)


3. I don't understand the whole serialization process, so I'm bit clueless how can `ufunc` (which is C implemented) be pickled not byref (as the all numpy, scipy ufuncs are being serialized if I understand correctly), I would love to hear your thoughts.
4. In the meanwhile I've bypassed this issue with a callable Class as a wrapper which saves the function and create the `ufunc` using `np.vectorize` on the fly.
mmckerns commented 2 years ago

@arielshulman: thanks for looking into this a bit more, and for the lead on the PR in numpy with the update to the serialization of ufunc. If there's a simple workaround as done here, then we can patch it it in dill. With regard to your (3), if the class is implemented in C, but has a __reduce__ (or similar) method that enables pickling... and that method is exposed to python, then the class will be serializable. The same information, essentially, is passed to the pickle copy_reg function as an alternate method to register how the object serializes. The former ensures the object will be serializable everywhere, the latter approach extends a particular serializer to know how to serialize the target object.