koodaamo / tnefparse

a TNEF decoding library written in python, without external dependencies
GNU Lesser General Public License v3.0
49 stars 37 forks source link

UnicodeDecodeError: 'gbk' codec can't decode byte 0xba in position 14: illegal multibyte sequence #120

Open aniude opened 2 years ago

aniude commented 2 years ago

I run tnefparse in command line with debug mode: tnefparse winmail.dat -l DEBUG

INFO:tnefparse:Skipping checksum for performance
DEBUG:tnefparse:Attribute type: 0x001e
DEBUG:tnefparse:Attribute name: 0x1008 (MAPI_RTF_SYNC_BODY_TAG)
ERROR:tnefparse:decode_mapi exception: 'gbk' codec can't decode byte 0xba in position 14: illegal multibyte sequence
DEBUG:tnefparse:exception details:
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/tnefparse/mapi.py", line 99, in decode_mapi
    attr_data, offset = parse_property(data, offset, attr_name, attr_type, codepage, num_mv_properties)
  File "/usr/local/lib/python3.9/site-packages/tnefparse/mapi.py", line 155, in parse_property
    item = item.decode(codepage)
UnicodeDecodeError: 'gbk' codec can't decode byte 0xba in position 14: illegal multibyte sequence

It seems like the encoding charset is wrong, is that anyway to set the charset as parameter?

jrideout commented 2 years ago

You can have different codecs nested in different parts of the document. We could probably have a universal override but that probably cause other problems.

What about returning the raw bytes when rather than an exception in the case of decoding error?

jrideout commented 2 years ago

@aniude are you able to share an example tnef that generates this error?