Closed tballison closed 4 months ago
I'm including the original pst, the mbox, the msg, the .eml and the debug file
Separately, we noticed non-deterministic behavior. Sometimes 7 files were extracted, and sometimes 8.
facepalm-- wrong project. I'm so sorry.
Thank you so much for an awesome library. While writing a wrapper for readpst for Apache Tika, we noticed a small number of cases where there were fewer attachments when selecting the .msg output option. Tika's jira issue: https://issues.apache.org/jira/browse/TIKA-4250
We were able to reproduce this with a test file we have in our unit tests: https://github.com/apache/tika/blob/main/tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/test/resources/test-documents/testPST.pst
The last email "8" is an email with an embedded email, and inside that embedded email is a docx file.
This is processed correctly with rfc822 and mbox output. However, there is no msg attachment within the 8.msg file.