jd-boyd / python-lzo

Python bindings for the LZO data compression library
GNU General Public License v2.0
74 stars 41 forks source link

Crash when decompressing bad input #87

Open meme-lord opened 2 months ago

meme-lord commented 2 months ago

I'm using the library to try detect compression being used on arbitrary data but for some inputs I get segfaults or crashes. This example produces a crash:

import lzo, base64
data = base64.b64decode("/wEdADavVXD1oYELeveMr0vHCmYPTPoxqw4HegY2nqHVRtv9df2xsVmz8UULJZKAHD9808mtKmk39dGImf+K2mB/hZP3HE0kGCxETaBN4mH23t7ZT+OeBFB+cqgI1F882YEWCREFc2UQXIfbuPtxCO+mmI8UVUDob17rdF4Wul4ziIpuOQ+TTKmEHUtTTtV5Z19JvC35U5mILuXN0yrVOCmQEXdMPisHGcr0roFHeyuURhMcn7h0bfGWl3unZ++6x051IydsKu0=")
lzo.decompress(data,False,500,algorithm='LZO1A')
$ python x.py
free(): corrupted unsorted chunks
Aborted (core dumped)
meme-lord commented 2 months ago

Here's another example:

import lzo, base64
data = base64.b64decode("9gAmgnQOMXth6xu9R+bKgGQ5zwFM2ZIMjBzA7bTN5Bsl+Yv4NQkvR8pXKX9ROY9oy0G4bnNO9KnXF+uYwoPvwl4jdLKFsClVsXpbb+hmgrFjfjCRAVL9H9emWqkA/6tiHRvx5/Sk3cwQhpdqA/unYOlmuHUcmSmR7BtnkHGZ4VCH5G+ylOKenAO9T3VVkldrAy+uVGSlnmqzINy9LvyJlpUwniLdXqw6jRz1XOepJ4RD40i0v2aPMcBsiRv2mlXlMfoS2mgseBLzVhBvLMi8g8yzHATvowUZKDGNXDDtNesCowoE2SY7dL32SrgDrkB0vaFVCsznXLSzN7sFowSISbWWmbaeEIt+LV56yI3INiD7TKZufC7qvsgMoB05T4vMw8vdo8jUth0pNP68J0BU4G3jg7iYf8f0cTwmCQPjWgpcvwBgTnCyGfplNdXq7wSt+yTcmqOOSDoOC7kLYq0Kn48gwGCBub6gvTghnW6knEyB9uTKYvQ341FEw6uc6nuzuBkw91vCTRf1uMktvqPPQv4GExKdWS0VcS6Lu1R52sfUaL2anFUZrT3bD6EO/ubnUfGU7SYVcaXiz+FK9LwOnE4kk/yCikaAW3Tkkra2D/HvdVocD6DiUZvvFt24hOxwNacP4XE9QW5ljLNLNMH+vDO6y+SEXFmSQcsv3ks5+Lvdt1CjGraTadMYVAA5g1sx9pXMI0hPGGsJDoSKvQGYfdNSp+9L1/5enCWvmTROb2iFipPtEb3+FV3HyMx5fZX9SPfidtz3eG6M2BZQ3ig+vPJruwBXV4mdDJnMDfPfAp7OaaL+LQRR8U+5yzQTRNkx7KlMqzta3lWc6WdvgkWoQAsKu6BkyMsvoAgwVMVWMuUErLXT5oCAT7C8k1b7bXPIbVWYCYyBU3QCa+WQSf6Qj/D1qdlCMqEjiryL7w==")
#data = data[:-25]
print(base64.b64encode(data))
x = lzo.decompress(data,False,5000,algorithm='LZO1')
print(x)

This gives:

free(): invalid next size (normal)
Aborted (core dumped)

If you uncomment the line that reduces the input a bit you can see that the output contains strings from memory like b'be\nmapped to (when map01 is not None, the digit 0 is always mapped to\nthe letter O). For ' which is from the base64 module.

jd-boyd commented 2 months ago

Thank you for reporting and for the easy re-producers. I'll try to take a look at it at some point. This may be difficult to fix though if the crash is in the underlying liblzo library though.

meme-lord commented 2 months ago

Another example but with metadata set to True:

lzo.decompress(b'\xf0\x1e\xc4\x07^\xf6\x03\x9e8\xe0]C\xf3A{\xe3m\xbc\xdb^\xf4\xeb\x8eE',True,algorithm='LZO1')