koodaamo / tnefparse

a TNEF decoding library written in python, without external dependencies
GNU Lesser General Public License v3.0
49 stars 37 forks source link

Made it so long filenames work in Python 3 #19

Closed ataylor32 closed 5 years ago

ataylor32 commented 6 years ago

There might be a more elegant way to do this.

petri commented 5 years ago

Please explain / show what is the problem.

ataylor32 commented 5 years ago

When I try to get a list of long filenames within a winmail.dat file, I get this:

[DEBUG] mapi-decode File "/var/www/example.com/venv-example/lib/python3.6/site-packages/tnefparse/mapi.py", line 101: decode_mapi Exception ord() expected string of length 1, but int found

And the result ends up being: b'PRIVAC~1.HTM'

But with this pull request, that exception doesn't occur and the result ends up being: privacypolicy.htm

petri commented 5 years ago

@ataylor32 can you share a winmail.dat file that exhibits this issue?

ataylor32 commented 5 years ago

I'm pretty sure I've only seen it happen on one winmail.dat file. Unfortunately, it's on my work email account, so I can't share it. Is there something else I can do to help besides share the file with you?

petri commented 5 years ago

@ataylor32 there's been some changes that may have fixed your issue. Can you try out the current master and let me know whether the issue still exists?

ataylor32 commented 5 years ago

The master branch mostly fixes my issue. This is the test script I'm using:

import logging

from tnefparse import TNEF

logging.basicConfig()

with open('winmail.dat', 'rb') as tfile:
    t = TNEF(tfile.read())
    print(t.attachments[0].long_filename())

Here's what the above script outputs with Python 2:

WARNING:tnef-decode:Unknown TNEF Object: <TNEFObject 'TNEF Version'>
WARNING:tnef-decode:Unknown TNEF Object: <TNEFObject 'OEM Codepage'>
WARNING:tnef-decode:Unknown TNEF Object: <TNEFObject 'Message Class'>
WARNING:tnef-decode:Unknown TNEF Object: <TNEFObject 'Owner Appointment ID'>
WARNING:tnef-decode:Unknown TNEF Object: <TNEFObject 'Response Requested'>
privacypolicy.htm

And here's what the above script outputs with Python 3:

WARNING:tnef-decode:Unknown TNEF Object: <TNEFObject 'TNEF Version'>
WARNING:tnef-decode:Unknown TNEF Object: <TNEFObject 'OEM Codepage'>
WARNING:tnef-decode:Unknown TNEF Object: <TNEFObject 'Message Class'>
WARNING:tnef-decode:Unknown TNEF Object: <TNEFObject 'Owner Appointment ID'>
WARNING:tnef-decode:Unknown TNEF Object: <TNEFObject 'Response Requested'>
b'privacypolicy.htm'

So the full filename is there in both, which is an improvement. Previously, Python 2 had privacypolicy.htm and Python 3 had b'PRIVAC~1.HTM'. However, I think it would be nice if tnefparse always returned the long filename as unicode, regardless of the Python version. Perhaps that's a separate issue, though (#7?).

Thanks!

petri commented 5 years ago

Ok thanks for confirmation. The encoding is best handled in #7 yes.