Open tballison opened 2 months ago
I'm including the original pst, the mbox, the msg, the .eml and the debug file
Separately, we noticed that we're getting non-deterministic output when we select the .msg option. Sometimes we get 7 files and sometimes we get 8.
To be clear; the libpst library has a long history with many contributors, the current maintainers didn't create the library but try to merge patches promptly and work on it when they are able to.
Thanks for the report and the test files, we'll take a look when we can.
The issue with non-deterministic output is known and has a workaround in git master, please comment on the issue if you still see it with the latest commit:
Thank you so much for an awesome library. While writing a wrapper for readpst for Apache Tika, we noticed a small number of cases where there were fewer attachments when selecting the .msg output option. Tika's jira issue: https://issues.apache.org/jira/browse/TIKA-4250
We were able to reproduce this with a test file we have in our unit tests: https://github.com/apache/tika/blob/main/tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/test/resources/test-documents/testPST.pst
The last email "8" is an email with an embedded email, and inside that embedded email is a docx file.
This is processed correctly with rfc822 and mbox output. However, there is no msg attachment within the 8.msg file.