tcalmant / python-javaobj

Extended fork of python-javaobj from http://code.google.com/p/python-javaobj/
Apache License 2.0
81 stars 19 forks source link

Request: optionally allow remaining bytes in stream #2

Closed voetsjoeba closed 8 years ago

voetsjoeba commented 8 years ago

The JavaObjectUnmarshaller.readObject method currently emits a warning when there are remaining bytes present beyond the serialized Java object. It would be useful to have a way to suppress that warning in cases where further data in the input stream is expected and normal.

Motivating use case is unserializing SecretKey objects from JCEKS keystores, as done in the pyjks project (https://github.com/doublereedkurt/pyjks). The JCEKS keystore format is almost entirely custom structured binary data, with the exception of SecretKey entries which are stored as wholesale serialized Java objects embedded right in the middle of the file.

Keystore entries can appear in any order and mix of types, and there are always raw signature bytes present at the end of a keystore, so it is fully expected for there to be trailing data after a call to readObject. There is no way of knowing in advance what the size of a serialized Java object is either, so we also can't get around it by feeding a subset of the stream to javaobj.

So my request is: could we have a flag to indicate that trailing data is expected, in which case no warning is emitted?

tcalmant commented 8 years ago

This could be a log level flag, to log either a warning, or a debug trace.

What do you think ?

voetsjoeba commented 8 years ago

I think the current behaviour of emitting the warning should be the default, indeed as you say to detect unintended or unexpected behaviour. So I was thinking maybe we could add a trailing_data_ok=False kwarg to the readObject method (and I guess also to load/loads?) that can be set to True if desired. Added bonus is that when reading through code, the kwarg argument will make the intended behaviour very clear.

P.S. woops, hit "Close and comment" instead of just "Comment" :o)

tcalmant commented 8 years ago

I've added an ignore_remaining_data flag to load(), loads() and readObject(). It must be explicitly set to True to disable the "trailing data" messages.

I've also updated the code a bit to be closer to the original project. The basic tests are passing, but I'll try to push a bit more to see if it works in all cases.

voetsjoeba commented 8 years ago

Great, ty! Will try it out later tonight when I get home. Any chance we could have a version bump as well, so we can depend on this new feature from pyjks?

tcalmant commented 8 years ago

A new version has been released (0.1.3) !

voetsjoeba commented 8 years ago

Syntax error on these lines in python 2.7 :(

def load(file_object, *transformers, ignore_remaining_data=False):
def loads(string, *transformers, ignore_remaining_data=False):

Putting the *transformers at the end of the argument list resolves it.

tcalmant commented 8 years ago

I don't have hands on a Python 2.7 interpreter :/ Does the following version works ?

def load(file_object, *transformers, **kwargs):
    ignore_remaining_data = kwargs.get("ignore_remaining_data", False)
voetsjoeba commented 8 years ago

Yep, that works.

Running into to some other errors beyond that when parsing keystores now -- should I continue here or do you prefer that I open separate issues?

tcalmant commented 8 years ago

Yes, a new issue would be better. Also, if you could send me some test files, that would be great :)

voetsjoeba commented 8 years ago

Ok, give me a moment to extract these serialized objects from the keystore I'm testing with and I'll create a separate case.