matteomattei / PySquashfsImage

Python library to read Squashfs image files.
GNU General Public License v3.0
50 stars 20 forks source link

LZO compression and holes #25

Closed xloem closed 1 year ago

xloem commented 1 year ago

Just a note that it doesn't seem able to open files with compression method 3. This is listed as lzo in the source code.

I was trying to open snaps but it didn't work :)

AT0myks commented 1 year ago

What's the exception or behavior you're seeing? What Python version? What PySquashfsImage version? If you could give a link to the exact image(s) you tried to open that would be helpful to replicate.

xloem commented 1 year ago

I’m not at the system, but basically snaps use LZO and there’s no LZO decompressor in PySquashFS, so it outputs an error regarding an unsupported compression method. Here’s an example image, warning it has some exotic short blocks too that are stored uncompressed: https://api.snapcraft.io/api/v1/snaps/download/XKEcBqPM06H1Z7zGOdG5fbICuf8NWK5R_2465.snap

EDIT: snaps are squashfs filesystems and can be renamed to .squashfs

AT0myks commented 1 year ago

Sorry I didn't realise what's going on. I thought you were talking about an issue with a currently supported compression method, and should have checked before answering. That's why I asked about version info. I thought all compression methods that squashfs supports were already handled in PySquashfsImage but indeed that's not the case. Thank you for providing this image, I used it to add LZO support and that should be coming soon, but it might require a few changes to the way decompression is done. It also helped me discover a bug in something I've been working on (related to extended inodes that I had not seen yet in an image) so thanks for that too.

xloem commented 1 year ago

Thanks for looking!

dissect.squashfs also does not handle these images ( https://github.com/fox-it/dissect.squashfs/issues/10 ) but no reply from them yet. This is the other one I tried that failed differently but maybe for the same reasons: https://api.snapcraft.io/api/v1/snaps/download/XKEcBqPM06H1Z7zGOdG5fbICuf8NWK5R_1862.snap

AT0myks commented 1 year ago

I'm not seeing any issues with this image either. I'm gonna try to push the commits pretty soon.

AT0myks commented 1 year ago

Just pushed the commits. If you install from source (pip install git+https://github.com/matteomattei/PySquashfsImage) and have python-lzo installed you should be able to run

with SquashFsImage.from_bytes(requests.get("...").content) as image:
    file = image.select("/usr/share/groff/1.22.4/font/devlatin1/DESC")
    print(file.read_bytes())  # or read_text()

without errors. Please tell me whether it works for you.

xloem commented 1 year ago

This is looking really great. It does look like some files might be read inconsistently:

from PySquashfsImage import SquashFsImage
image = SquashFsImage("XKEcBqPM06H1Z7zGOdG5fbICuf8NWK5R_1862.snap") # https://api.snapcraft.io/api/v1/snaps/download/XKEcBqPM06H1Z7zGOdG5fbICuf8NWK5R_1862.snap
file = image.select("/usr/lib/i386-linux-gnu/dri/iHD_drv_video.so")
read_size = len(file.read_bytes())
assert file.size == read_size # 4.5 megabytes missing for me
AT0myks commented 1 year ago

I just spent five hours trying to debug this. This is the only file in that image that is extracted incorrectly and I couldn't figure out why. I almost thought it was related to python-lzo. Turns out it's because it has 2 holes for a total of 4456448 bytes, and holes are not handled in PySquashfsImage yet. I have a fix for this specific file but I have to look into it a bit more before I can push something.

AT0myks commented 1 year ago

Done, the file should be read correctly now.

xloem commented 1 year ago

Thank you for working on this. It seems it might be worthwhile to check against many snaps some day. If it’s helpful, I got the chromium snap urls from the url fieldsof requests.get(f'http://api.snapcraft.io/v2/snaps/info/{"chromium"}', headers={'Snap-Device-Series':'16'}).json(). All snaps are squashfs images.