irmen / Pyrolite

Java and .NET client interface for Pyro5 protocol
MIT License
178 stars 47 forks source link

net.razorvine.pickle.PickleException: expected zero arguments for construction of ClassDict (for numpy.core.multiarray._reconstruct) #39

Closed woshilaiceshide closed 8 years ago

woshilaiceshide commented 8 years ago

I've encountered the same problem as described on http://stackoverflow.com/questions/21794750/read-python-pickle-data-stream-in-android : "I ran tests using the various pickle protocols (0-3) and found that it fails for 0 and 1, but succeeds for 2 and 3..."

Could you help me, please?

woshilaiceshide commented 8 years ago

So sorry. It's just need a customized IObjectConstructor.

close it.

timematcher commented 6 years ago

Could you shed some light on what do you mean by customized IObject constructor? I am trying to do the same in .NET but getting the same error. here is my code: var stream = new FileStream(filePkl, FileMode.Open); Unpickler unpickler = new Unpickler(); Object data = unpickler.load(stream);

Appreciate your detailed response on this. Thanks

irmen commented 6 years ago

@timematcher are you using numpy as well? The easiest is probably to not send numpy arrays across Pyro at all, and first convert them to regular python arrays or bytearrays. Those will be unpickled just fine without changes to your .net code.

Otherwise you can use the Unpickler.registerConstructor method to teach Pyrolite about how to deserialize classes it doesn't know about. Look in the source code for examples of how to use this.

timematcher commented 6 years ago

@irmen I am working with a .PKL file that was generated by a machine learning training algorithm/program written in python. It was provided to me and I do not know python.... just .NET and a little bit of Java and C++.

I have tried to communicate to the provider that the PKL be regenerated with custom options so that the output (pkl) file it generates should use regular arrays instead of the numpy arrays. However, I am not sure the provider has much control over this (generation) process.

Here is snapshot of what the PKL file looks internally. image

Having numpy array support would be a great thing indeed. I have seen people trying to use pyrolite and then giving up because it does not support numpy. I myself dont know what numpy array is and how it works and I suppose its difficult to implement and that is why it was left out,.. maybe?

SpecificallY I tried to do this in C#.

            var pklBytes = System.IO.File.ReadAllBytes(filePkl);
            var serializer = PickleSerializer.GetFor(Config.SerializerType.pickle);
             var deSerialized = serializer.deserializeData(pklBytes);

and here is the error I get:

PickleException: expected zero arguments for construction of ClassDict (for numpy.core.multiarray._reconstruct)

Here is the screenshot of the stack trace and error details. image

irmen commented 6 years ago

Dumping the binary pickle file is not very helpful. Pyrolite itself doesn't know how to deserialize arbitrary numpy arrays. It only knows about Python's built-in types and datastructures (and a couple of custom types from the Pyro library). Numpy arrays are a complex beast ( https://docs.scipy.org/doc/numpy-1.14.0/reference/generated/numpy.ndarray.html ) .

Surely you know SOMETHING about what data you're getting?

If the provider isn't able to convert their data to basic data types first, you're going to have to do it yourself. I think the easiest way is to learn a bit of Python and use Python itself (+numpy) to read the pickle file you have, convert it into something simpler (basic arrays or lists), and send that over to your .net program. The other option is to write a custom class constructor and register this with pyrolite, like pointed out above. You're going to have to dive deeply into the data format of pickled numpy arrays and what is inside your dataset. For this, it's probably very useful to learn a bit of Python as well, to know what you're dealing with and to play with the data dump yourself

timematcher commented 6 years ago

@ireman I juts found out that the PKL files are actual Machine learning models generated by SciKit learn. I might need to look into scikit learn and see how can i generate PKL files without numpy arrays.

irmen commented 6 years ago

Tbh, sounds like you really should invest into getting to know a bit of Python (+SkiKit learn) to (pre) process those files. But that's just my view