pst-format / libpst

library for reading Microsoft Outlook PST files
GNU General Public License v2.0
16 stars 4 forks source link

Invalid separator generated for nested messages #3

Closed brychcy closed 2 years ago

brychcy commented 2 years ago

When using readpst -e, the resulting .eml files with nested messages can not be propery parsed with the python email package. The problem is that the content of the nested messages starts with an unexpected line like

From "foo@bar.com" Tue Oct 5 14:30:28 2021

It is the separator generated in readpst.c by write_normal_email() when it is called by attachments with mode=MODE_NORMAL, but here the toplevel mode should be respected, which is shadowed by the parameter

The problem can be fixed by renaming the mode parameter and using it only at the other location in the method (to decide if attachments should go to their own file.)

pabs3 commented 2 years ago

Thanks for the report.

Could you attach an example PST file? I will need it to verify the current incorrect behaviour and review the changes that are made to the behaviour by your patch.

Please also attach an example script using the Python email library.

-- bye, pabs

https://bonedaddy.net/pabs3/

brychcy commented 2 years ago

libpst3.pst.gz Example pst file. It contains one mail with a nested mail that was forwarded as attachment. The nested mail contains an attachment called "HelloWorld.pdf"

brychcy commented 2 years ago

Python script for testing (Same script as in #1). Should output "found: HelloWorld.pdf"

#!/usr/bin/env python3
# use python 3.6 or later

import email
import email.policy

# point this to the file generated with "readpst -e ..."
filename = "/Users/till/opensource/libpst/test/patched/Outlook-Datendatei1/libpst3/1.eml"

with open(filename, "rb") as f:
    msg = email.message_from_binary_file(f, policy=email.policy.default)

# uncomment the following lines to print the structure
# from email.iterators import _structure
# _structure(msg)

found = False
for part in msg.walk():
    if part.is_attachment() and part.get_content_type() == 'application/pdf':
        filename = part.get_filename(failobj="")
        found = True
        print("found: " + filename)

if not found:
    print("no pdf found!")