libyal / libpff

Library and tools to access the Personal Folder File (PFF) and the Offline Folder File (OFF) format
GNU Lesser General Public License v3.0
289 stars 74 forks source link

libpff_data_array_read_entries: invalid number of array entries value out of bounds when reading from a compressed OST #70

Closed nelley closed 2 years ago

nelley commented 6 years ago
for message in folder.sub_messages:
IOError: pypff_folder_get_sub_messages: unable to retrieve number of sub messages. libpff_data_array_read_entries: invalid number of array entries value out of bounds. libpff_data_array_read: unable to read data array entries. libpff_io_handle_read_descriptor_data_list: unable to read data array. libpff_table_read: unable to read descriptor: 2292334 data: 85339494 list. libpff_item_values_read: unable to read table. libpff_folder_determine_sub_messages: unable to read descriptor identifier: 2292334. libpff_folder_get_number_of_sub_messages: unable to determine sub messages.
-------------------------------------------

I got a error when I iterating the sub messages. I noticed only the folder contains the big number of messages will trigger this error. p.s. over 20,000 messages in one folder

praveenmaniyan commented 4 years ago

I am getting the above error message for my OST file which has more than 40,000 messages. May I know why this issue was closed, or is there a work around for the same? Is this a limitation of some array size in the Code?

joachimmetz commented 4 years ago

@praveenmaniyan the original reporter closed the issue, so I have no idea.

Can you provide the full error you are encountering?

praveenmaniyan commented 4 years ago

OSError: pypff_folder_get_number_of_sub_messages: unable to retrieve number of sub messages. libpff_data_array_read_entries: invalid number of array entries value out of bounds. libpff_data_array_read: unable to read data array entries. libpff_io_handle_read_descriptor_data_list: unable to read data array. libpff_table_read: unable to read descriptor: 8590 data: 39871842 list. libpff_item_values_read: unable to read table. libpff_folder_determine_sub_messages: unable to read descriptor identifier: 8590. libpff_folder_get_number_of_sub_messages: unable to determine sub messages.

The above the error encountered. I have more than 40,000 messages in one of my OST/PST folder, and this error is encountered at that particular folder.

Please let me know a fix so that I use this library.

joachimmetz commented 4 years ago

Which version of libpff/pypff are you using. Note that work on pypff is not yet completed https://github.com/libyal/libpff/issues/2.

praveenmaniyan commented 4 years ago

I am using the latest release "libpff-experimental-20180714" version.

Yes, I understand the work on pypff is not yet complete, however, the count_of_items in a folder should be basic functionality, right?

praveenmaniyan commented 4 years ago

@joachimmetz , one update on this issue. I exported the OST to a PST, and the folder had 38714 messages. This got successfully exported when the input file was PST. However, the same script failed for OST input file. So, this error seems to be happening only for OST files.

joachimmetz commented 4 years ago

@praveenmaniyan

praveenmaniyan commented 4 years ago

I ran the pffinfo command on the OST, please find below the output.

pffinfo 20120802

Personal Folder File information:
        File size:              9440403456 bytes
        File content type:      Offline Storage Tables (OST)
        File type:              64-bit
        Encryption type:        none

Message store:`

The message/folder type is "Inbox", please find below the complete error of the script run. For some folders like "Deleted Items" and some other Personal folders it is running fine. Please let me know for any further information, also any specific command you want me to run on the OST file.

`Folder Name ----> IPM_SUBTREE
Number of messages is ---> 0
 Folder Name ----> Organization Forms
Number of messages is ---> 0
 Folder Name ----> EFORMS REGISTRY
 Number of messages is ---> 0
 Folder Name ----> NON_IPM_SUBTREE
 Number of messages is ---> 0
 Folder Name ----> Root - Public
 Number of messages is ---> 0
 Folder Name ----> Common Views
 Number of messages is ---> 0
 Folder Name ----> Reminders
 Number of messages is ---> 0
 Folder Name ----> To-Do Search
 Number of messages is ---> 0
 Folder Name ----> Missed Conversations
 Number of messages is ---> 0
 Folder Name ----> Voice Mail
 Number of messages is ---> 0
 Folder Name ----> Inbox
Traceback (most recent call last):
File "pst_traverse_reporter.py", line 412, in <module>
main(args.PST_FILE, args.title)
File "pst_traverse_reporter.py", line 37, in main
folderTraverse(root)
File "pst_traverse_reporter.py", line 74, in folderTraverse
folderTraverse(folder) # Call new folder to traverse:
File "pst_traverse_reporter.py", line 74, in folderTraverse
folderTraverse(folder) # Call new folder to traverse:
File "pst_traverse_reporter.py", line 76, in folderTraverse
checkForMessages(folder)
File "pst_traverse_reporter.py", line 90, in checkForMessages
print("Number of messages is --->", folder.number_of_sub_messages)
OSError: pypff_folder_get_number_of_sub_messages: unable to retrieve number of sub messages. libpff_data_array_read_entries: invalid number of array entries value out of bounds. libpff_data_array_read: unable to read data array entries. libpff_io_handle_read_descriptor_data_list: unable to read data array. libpff_table_read: unable to read descriptor: 8590 data: 49142534 list. libpff_item_values_read: unable to read table. libpff_folder_determine_sub_messages: unable to read descriptor identifier: 8590. libpff_folder_get_number_of_sub_messages: unable to determine sub messages.
joachimmetz commented 4 years ago

@praveenmaniyan based on your output it looks like you are using libpff 20120802 but you indicated earlier that you were using 20180714? Do you have multiple versions on your system?

praveenmaniyan commented 4 years ago

@joachimmetz , I installed pffinfo only after you requested for OST file information, i.e., on 24-Feb-2020 only. I have used pffinfo executable which I installed using apt install. However, for running the code and extracting messages from OST I have been using the library/code of 20180714.

Can you please re-direct me to the code/module of C/C++ where is array is initialized, and how the size calculation is done, so that I can try to debug it from my side.

joachimmetz commented 4 years ago

thx for the clarification

The error contains the location:

OSError: pypff_folder_get_number_of_sub_messages: unable to retrieve number of sub messages.
libpff_data_array_read_entries: invalid number of array entries value out of bounds.
libpff_data_array_read: unable to read data array entries.
libpff_io_handle_read_descriptor_data_list: unable to read data array.
libpff_table_read: unable to read descriptor: 8590 data: 49142534 list.
libpff_item_values_read: unable to read table. libpff_folder_determine_sub_messages: unable to read descriptor identifier: 8590.
libpff_folder_get_number_of_sub_messages: unable to determine sub messages.

https://github.com/libyal/libpff/blob/34c48fb28e0d28f829d801d7bce646ef8eebe6d4/libpff/libpff_data_array.c#L520

It might help to get some format debug output also see https://github.com/libyal/libpff/wiki/Troubleshooting#verbose-and-debug-output

LastTechBender commented 4 years ago

Hi @joachimmetz @praveenmaniyan

I am getting a similar error when trying to read a large OST file. Has there been any updates or solutions to this issue?

Best wishes

ftang001 commented 3 years ago

Hi @joachimmetz I had similar errors with a large (16 Gb) OST file, using both the 2018 version and the current source. With the help of your docs (many thanks!) I eventually wrote my own crude reader and successfully extracted the full file, with no corruption. I wasn't able to confirm it, but I suspect the problem with libpff may be related to compressed internal node blocks (indexes), which appear in large folders and messages; my less-than-elegant approach to the compression issue was to run a first pass over the file and inflate all compressed blocks into a new file before walking the folder structure.

joachimmetz commented 3 years ago

@ftang001 can you share the test data? Otherwise this issue will unlike to be addressed soon

Also making such claims without your source being public is not helping anyone.

shmulikipod1151 commented 3 years ago

@joachimmetz Thank you so much for all of this awesome work! this is unbelievable I encounter the same problem when handling folders with a large amount of items (above 4000 or so) in an .ost format Did you had to chance to take a look at it? Thanks in advance!

joachimmetz commented 3 years ago

Did you had to chance to take a look at it?

@shmulikipod1151 per previous comments: can you share the test data? Otherwise this issue will unlikely to be addressed soon

ftang001 commented 3 years ago

Hi again Joachim and shmulikipod1151

As suggested I have subscribed to O365 but I haven't managed to generate enough test data to repeat the behaviour.

My offer to share the code stands - I'm just not sure how best to present it, as it is embarrassingly rough and I really don't want it to publish it and distract others from your work. But I can write a few notes and send you a copy if you'd like to see it.

Regards Jim.

joachimmetz commented 3 years ago

Source is not test data, it will NOT help address the issue quickly, especially if it is "rough". Also if the source is not FOSS it cannot be used.

shmulikipod1151 commented 3 years ago

@joachimmetz Hi, Sorry for not responding, I was on vacation for a few days. My ost is pretty big, is it possible to share the script that can produce one? Basically, it's a python script that sends the same mail many times (about 4000 times makes it happen), and the outbox becomes a folder with many items and reproduces the issue. If that's not convinient, I will create a fresh .ost for this purpose.

joachimmetz commented 3 years ago

@shmulikipod1151 excellent yes that is already a step in the right direction, I'm doing similar things with other formats on https://github.com/dfirlabs. Happy to create a repo if you're fine sharing your script under a FOSS license.

The annoying thing with PST/OST is that one needs Microsoft Outlook, I would need to dig that up, and that might take a while, since it is not one of the data formats currently on the top of my list.

shmulikipod1151 commented 3 years ago

https://github.com/dfirlabs this is awesome work, thanks! Let me try and save you some time by generating a minimal OST that reproduces the issue. If that won't work (the OST will be huge) I will share the script.

shmulikipod1151 commented 3 years ago

@joachimmetz , I've managed to create an ost that reproduces the issue I just sent my self the same email 15000 times (I wrote a powershell script for it) I got this 1gb file, how can I share it with you? Can I send you a link by mail?

joachimmetz commented 3 years ago

Excellent, much appreciated for the 1G file, upload to Google Drive or equiv and mail me the link, that should work.

If you want to share the powershell script I can create https://github.com/dfirlabs/pst-specimens (or equiv) and you send a PR to that project. So that the script is linked to you as contributor as well.

joachimmetz commented 2 years ago

Looks like compressed size of descriptor is passed to libpff_data_array_read_entries instead of uncompressed size of data block.

joachimmetz commented 2 years ago

With thanks to @shmulikipod1151 I was able to reproduce the issue, which is limited to compressed OST files.

Note that this could have been solved in 2018 if people would provide test data or debug output https://github.com/libyal/libpff/wiki/Troubleshooting#verbose-and-debug-output.

joachimmetz commented 2 years ago

Changes in https://github.com/libyal/libpff/commit/9d6a3f08379abe05c332a83f299cb5fab07348cc