Closed nelley closed 2 years ago
I am getting the above error message for my OST file which has more than 40,000 messages. May I know why this issue was closed, or is there a work around for the same? Is this a limitation of some array size in the Code?
@praveenmaniyan the original reporter closed the issue, so I have no idea.
Can you provide the full error you are encountering?
OSError: pypff_folder_get_number_of_sub_messages: unable to retrieve number of sub messages. libpff_data_array_read_entries: invalid number of array entries value out of bounds. libpff_data_array_read: unable to read data array entries. libpff_io_handle_read_descriptor_data_list: unable to read data array. libpff_table_read: unable to read descriptor: 8590 data: 39871842 list. libpff_item_values_read: unable to read table. libpff_folder_determine_sub_messages: unable to read descriptor identifier: 8590. libpff_folder_get_number_of_sub_messages: unable to determine sub messages.
The above the error encountered. I have more than 40,000 messages in one of my OST/PST folder, and this error is encountered at that particular folder.
Please let me know a fix so that I use this library.
Which version of libpff/pypff are you using. Note that work on pypff is not yet completed https://github.com/libyal/libpff/issues/2.
I am using the latest release "libpff-experimental-20180714" version.
Yes, I understand the work on pypff is not yet complete, however, the count_of_items in a folder should be basic functionality, right?
@joachimmetz , one update on this issue. I exported the OST to a PST, and the folder had 38714 messages. This got successfully exported when the input file was PST. However, the same script failed for OST input file. So, this error seems to be happening only for OST files.
@praveenmaniyan
I ran the pffinfo command on the OST, please find below the output.
pffinfo 20120802
Personal Folder File information:
File size: 9440403456 bytes
File content type: Offline Storage Tables (OST)
File type: 64-bit
Encryption type: none
Message store:`
The message/folder type is "Inbox", please find below the complete error of the script run. For some folders like "Deleted Items" and some other Personal folders it is running fine. Please let me know for any further information, also any specific command you want me to run on the OST file.
`Folder Name ----> IPM_SUBTREE
Number of messages is ---> 0
Folder Name ----> Organization Forms
Number of messages is ---> 0
Folder Name ----> EFORMS REGISTRY
Number of messages is ---> 0
Folder Name ----> NON_IPM_SUBTREE
Number of messages is ---> 0
Folder Name ----> Root - Public
Number of messages is ---> 0
Folder Name ----> Common Views
Number of messages is ---> 0
Folder Name ----> Reminders
Number of messages is ---> 0
Folder Name ----> To-Do Search
Number of messages is ---> 0
Folder Name ----> Missed Conversations
Number of messages is ---> 0
Folder Name ----> Voice Mail
Number of messages is ---> 0
Folder Name ----> Inbox
Traceback (most recent call last):
File "pst_traverse_reporter.py", line 412, in <module>
main(args.PST_FILE, args.title)
File "pst_traverse_reporter.py", line 37, in main
folderTraverse(root)
File "pst_traverse_reporter.py", line 74, in folderTraverse
folderTraverse(folder) # Call new folder to traverse:
File "pst_traverse_reporter.py", line 74, in folderTraverse
folderTraverse(folder) # Call new folder to traverse:
File "pst_traverse_reporter.py", line 76, in folderTraverse
checkForMessages(folder)
File "pst_traverse_reporter.py", line 90, in checkForMessages
print("Number of messages is --->", folder.number_of_sub_messages)
OSError: pypff_folder_get_number_of_sub_messages: unable to retrieve number of sub messages. libpff_data_array_read_entries: invalid number of array entries value out of bounds. libpff_data_array_read: unable to read data array entries. libpff_io_handle_read_descriptor_data_list: unable to read data array. libpff_table_read: unable to read descriptor: 8590 data: 49142534 list. libpff_item_values_read: unable to read table. libpff_folder_determine_sub_messages: unable to read descriptor identifier: 8590. libpff_folder_get_number_of_sub_messages: unable to determine sub messages.
@praveenmaniyan based on your output it looks like you are using libpff 20120802 but you indicated earlier that you were using 20180714? Do you have multiple versions on your system?
@joachimmetz , I installed pffinfo only after you requested for OST file information, i.e., on 24-Feb-2020 only. I have used pffinfo executable which I installed using apt install. However, for running the code and extracting messages from OST I have been using the library/code of 20180714.
Can you please re-direct me to the code/module of C/C++ where is array is initialized, and how the size calculation is done, so that I can try to debug it from my side.
thx for the clarification
The error contains the location:
OSError: pypff_folder_get_number_of_sub_messages: unable to retrieve number of sub messages.
libpff_data_array_read_entries: invalid number of array entries value out of bounds.
libpff_data_array_read: unable to read data array entries.
libpff_io_handle_read_descriptor_data_list: unable to read data array.
libpff_table_read: unable to read descriptor: 8590 data: 49142534 list.
libpff_item_values_read: unable to read table. libpff_folder_determine_sub_messages: unable to read descriptor identifier: 8590.
libpff_folder_get_number_of_sub_messages: unable to determine sub messages.
It might help to get some format debug output also see https://github.com/libyal/libpff/wiki/Troubleshooting#verbose-and-debug-output
Hi @joachimmetz @praveenmaniyan
I am getting a similar error when trying to read a large OST file. Has there been any updates or solutions to this issue?
Best wishes
Hi @joachimmetz I had similar errors with a large (16 Gb) OST file, using both the 2018 version and the current source. With the help of your docs (many thanks!) I eventually wrote my own crude reader and successfully extracted the full file, with no corruption. I wasn't able to confirm it, but I suspect the problem with libpff may be related to compressed internal node blocks (indexes), which appear in large folders and messages; my less-than-elegant approach to the compression issue was to run a first pass over the file and inflate all compressed blocks into a new file before walking the folder structure.
@ftang001 can you share the test data? Otherwise this issue will unlike to be addressed soon
Also making such claims without your source being public is not helping anyone.
@joachimmetz Thank you so much for all of this awesome work! this is unbelievable I encounter the same problem when handling folders with a large amount of items (above 4000 or so) in an .ost format Did you had to chance to take a look at it? Thanks in advance!
Did you had to chance to take a look at it?
@shmulikipod1151 per previous comments: can you share the test data? Otherwise this issue will unlikely to be addressed soon
Hi again Joachim and shmulikipod1151
As suggested I have subscribed to O365 but I haven't managed to generate enough test data to repeat the behaviour.
My offer to share the code stands - I'm just not sure how best to present it, as it is embarrassingly rough and I really don't want it to publish it and distract others from your work. But I can write a few notes and send you a copy if you'd like to see it.
Regards Jim.
Source is not test data, it will NOT help address the issue quickly, especially if it is "rough". Also if the source is not FOSS it cannot be used.
@joachimmetz Hi, Sorry for not responding, I was on vacation for a few days. My ost is pretty big, is it possible to share the script that can produce one? Basically, it's a python script that sends the same mail many times (about 4000 times makes it happen), and the outbox becomes a folder with many items and reproduces the issue. If that's not convinient, I will create a fresh .ost for this purpose.
@shmulikipod1151 excellent yes that is already a step in the right direction, I'm doing similar things with other formats on https://github.com/dfirlabs. Happy to create a repo if you're fine sharing your script under a FOSS license.
The annoying thing with PST/OST is that one needs Microsoft Outlook, I would need to dig that up, and that might take a while, since it is not one of the data formats currently on the top of my list.
https://github.com/dfirlabs this is awesome work, thanks! Let me try and save you some time by generating a minimal OST that reproduces the issue. If that won't work (the OST will be huge) I will share the script.
@joachimmetz , I've managed to create an ost that reproduces the issue I just sent my self the same email 15000 times (I wrote a powershell script for it) I got this 1gb file, how can I share it with you? Can I send you a link by mail?
Excellent, much appreciated for the 1G file, upload to Google Drive or equiv and mail me the link, that should work.
If you want to share the powershell script I can create https://github.com/dfirlabs/pst-specimens (or equiv) and you send a PR to that project. So that the script is linked to you as contributor as well.
Looks like compressed size of descriptor is passed to libpff_data_array_read_entries instead of uncompressed size of data block.
With thanks to @shmulikipod1151 I was able to reproduce the issue, which is limited to compressed OST files.
Note that this could have been solved in 2018 if people would provide test data or debug output https://github.com/libyal/libpff/wiki/Troubleshooting#verbose-and-debug-output.
I got a error when I iterating the sub messages. I noticed only the folder contains the big number of messages will trigger this error. p.s. over 20,000 messages in one folder