tcalmant / python-javaobj

Extended fork of python-javaobj from http://code.google.com/p/python-javaobj/
Apache License 2.0
81 stars 19 forks source link

Unable to parse java.util.LinkedHashMap #23

Closed guywithface closed 5 years ago

guywithface commented 5 years ago

I keep on getting this error when parsing a Java object file:

Traceback (most recent call last): File "src/github.com/python-javaobj/javaobj.py", line 623, in _read_and_exec_opcode handler = self.opmap[opid] KeyError: 8

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "./test.py", line 48, in obj = javaobj.loads(data) File "src/github.com/python-javaobj/javaobj.py", line 206, in loads ignore_remaining_data=ignore_remaining_data) File "src/github.com/python-javaobj/javaobj.py", line 187, in load return marshaller.readObject(ignore_remaining_data=ignore_remainingdata) File "src/github.com/python-javaobj/javaobj.py", line 564, in readObject , res = self._read_and_exec_opcode(ident=0) File "src/github.com/python-javaobj/javaobj.py", line 629, in _read_and_exec_opcode return opid, handler(ident=ident) File "src/github.com/python-javaobj/javaobj.py", line 913, in do_object opcode, obj = self._read_and_exec_opcode(ident=ident + 1) File "src/github.com/python-javaobj/javaobj.py", line 627, in _read_and_exec_opcode .format(opid, position)) RuntimeError: Unknown OpCode in the stream: 0x8 (at offset 0x7C)

And I think it's because the decoder isn't parsing this portion properly for a TC_BLOCKDATA section:

006D 68 6F 6C 64 78 70 3F 40 00 00 00 00 00 0C 77 08 holdxp?@......w. 007D 00 00 00 10 00 00 00 0A 74 00 1C 24 31 35 35 30 ........t..$1550

It's that 0x08 byte at the end of the first line and the proper opcode is the byte before it 0x77 which relates to the start of a TC_BLOCKDATA section.

Please let me know if there's anything else I can do to help debug.

tcalmant commented 5 years ago

Could you try with the latest version from the git repository ? I've added an __extra_loading__ method to do work once the object has been read from the file. Now, LinkedList and LinkedHashMap should behave as intended.

guywithface commented 5 years ago

@tcalmant That worked pretty well and thanks for the quick response! Made it onto the next exception though after the linked hash map.

File "./test.py", line 22 obj = javaobj.loads(data) File "src/github.com/python-javaobj/javaobj.py", line 206, in loads ignore_remaining_data=ignore_remaining_data) File "src/github.com/python-javaobj/javaobj.py", line 187, in load return marshaller.readObject(ignore_remaining_data=ignore_remainingdata) File "src/github.com/python-javaobj/javaobj.py", line 564, in readObject , res = self._read_and_exec_opcode(ident=0) File "src/github.com/python-javaobj/javaobj.py", line 629, in _read_and_exec_opcode return opid, handler(ident=ident) File "src/github.com/python-javaobj/javaobj.py", line 901, in do_object res = self._read_value(field_type, ident, name=field_name) File "src/github.com/python-javaobj/javaobj.py", line 1115, in _readvalue , res = self._read_and_exec_opcode(ident=ident + 1) File "src/github.com/python-javaobj/javaobj.py", line 629, in _read_and_exec_opcode return opid, handler(ident=ident) File "src/github.com/python-javaobj/javaobj.py", line 927, in do_object java_object.extra_loading(self, ident) File "src/github.com/python-javaobj/javaobj.py", line 1652, in extra_loading raise ValueError("Start of block data not found") ValueError: Start of block data not found

Debug output:

DEBUG:root:java.util.HashMap DEBUG:root:--- DEBUG:root:>>> java_object: {} DEBUG:root: ## New reference handle 0x7E0009: JavaMap -> {} DEBUG:root: Constructing class... DEBUG:root: Class: java.util.HashMap DEBUG:root: F loadFactor - I threshold DEBUG:root: Values count: 2 DEBUG:root: Prepared list of values: ['loadFactor', 'threshold'] DEBUG:root: Prepared list of types: ['F', 'I'] DEBUG:root:Reading field: F - loadFactor DEBUG:root: F loadFactor: 0.75 DEBUG:root:Reading field: I - threshold DEBUG:root: I threshold: 0 DEBUG:root: java_object.annotations before: [] DEBUG:root: OpCode: 0x77 -- TC_BLOCKDATA (at offset 0x183) DEBUG:root: [blockdata] DEBUG:root: objectAnnotation value: DEBUG:root: OpCode: 0x74 -- TC_STRING (at offset 0x18D) DEBUG:root: [string] DEBUG:root: ## New reference handle 0x7E000A: JavaString -> testing DEBUG:root: objectAnnotation value: testing DEBUG:root: OpCode: 0x74 -- TC_STRING (at offset 0x197) DEBUG:root: [string] DEBUG:root: ## New reference handle 0x7E000B: JavaString -> Helloworld! DEBUG:root: objectAnnotation value: Helloworld! DEBUG:root: OpCode: 0x78 -- TC_ENDBLOCKDATA (at offset 0x1C5) DEBUG:root: objectAnnotation value: None DEBUG:root: java_object.annotations after: ['\x00\x00\x00\x01\x00\x00\x00\x01', 'testing', 'Helloworld!'] DEBUG:root:Java object has extra loading capability.

tcalmant commented 5 years ago

Okay it seems that the LinkedHashMap and HashMap serializations are not compatible. I've just uploaded a version which should do for both.

guywithface commented 5 years ago

WOO IT WORKS! Thank you so much! On a side note I noticed that Java internally uses a modified non standard UTF-8 codec for strings when serializing objects. Python handles most of them with their standard implementation of UTF-8 but sometimes will fail. I managed to jam Java's modified UTF-8 codec into your library to handle this if you're interested in a pull request.

https://bugs.python.org/issue2857

tcalmant commented 5 years ago

You're welcome 👍 And yes, it would be very nice to have some handling of the Java encoding :)

guywithface commented 5 years ago

Opened the PR! Thanks again for the help.

huettenhain commented 4 years ago

I am experiencing the same issue with the following serialized data blob: OssePatterned.jser.zip; I have tried using the most recent version from the repository but it still fails. I know little about the Java serialization format unfortunately, but it also complains about the byte 0x08 at offset 0x7C and there is a byte with value 0x77 right in front of it, so my guess is that this issue is somehow related. Cheers and thanks a lot in advance!

tcalmant commented 4 years ago

Hi, I can reproduce the bug but I haven't found a solution yet. Could you open a new ticket and describe what is suppose to be read ?