libyal / libpff

Library and tools to access the Personal Folder File (PFF) and the Offline Folder File (OFF) format
GNU Lesser General Public License v3.0
286 stars 74 forks source link

Handling email embedded within email in .msg output #128

Closed tballison closed 4 months ago

tballison commented 4 months ago

Thank you so much for an awesome library. While writing a wrapper for readpst for Apache Tika, we noticed a small number of cases where there were fewer attachments when selecting the .msg output option. Tika's jira issue: https://issues.apache.org/jira/browse/TIKA-4250

We were able to reproduce this with a test file we have in our unit tests: https://github.com/apache/tika/blob/main/tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/test/resources/test-documents/testPST.pst

The last email "8" is an email with an embedded email, and inside that embedded email is a docx file.

This is processed correctly with rfc822 and mbox output. However, there is no msg attachment within the 8.msg file.

tballison commented 4 months ago

test-pst.zip

I'm including the original pst, the mbox, the msg, the .eml and the debug file

tballison commented 4 months ago

Separately, we noticed non-deterministic behavior. Sometimes 7 files were extracted, and sometimes 8.

image

tballison commented 4 months ago

facepalm-- wrong project. I'm so sorry.