libyal / libpff

Library and tools to access the Personal Folder File (PFF) and the Offline Folder File (OFF) format
GNU Lesser General Public License v3.0
286 stars 74 forks source link

Attachment data ranges (possible `libfdata` bug?) #103

Open ztravis opened 3 years ago

ztravis commented 3 years ago

After a recent libpff rebuild I've run into an unusual bug where I get incorrect data when reading attachment ranges - in particular, "stored" attachments backed by OLE data. I'm using the python bindings, so I'll show an approximate "demo" here:

item = ... get pst item ...
# Assume this is an OLE/stored attachment (i.e. attachment type 6)
attachment = item.get_attachment(0)
# Short reads don't advance
first_read = attachment.read_buffer(10)
wrong_second_read = attachment.read_buffer(10)
# You can read everything correctly
attachment.seek_offset(0)
correct_data = attachment.read(attachment.get_size())

After this, first_read and second_read are (always) the same even if the data should be different, whereas reading everything in one go is correct, and in between (e.g. reading 8K blocks) you get some weird mixed data, possibly due to incorrectly handling offsets inside of segments?

I've found some related commits in libfdata, in particular new conditionals around seeking to the desired offset (commit 04389fc2d, e.g. line 1666 where the problem is fixed if I always bypass the new conditionals. I'm not sure if this is an error with this change or with how libpff is building its embedded object streams, or something else.

I will try and provide more information (and a test case, if I can find or make some clean data with an OLE/stored attachment) as I continue to debug, but I wanted to raise this sooner rather than later since you might also have some thoughts.

joachimmetz commented 3 years ago

Per https://github.com/libyal/libpff/issues/2 the Python bindings are not finished yet. I'll take a look when time permits

ztravis commented 3 years ago

I don't think the bug is around the python bindings, though. This example works with the one change to the libfdata C library. I will try and provide you with a C-only demo of the issue.