Open pulse-mind opened 4 years ago
What encoding has that e-mail you are having problems with? (is it utf-8?)
That code comes from:
https://github.com/martinrusev/imbox/commit/ba913fe31dd6146f9500583916d4332edce1c481 https://github.com/martinrusev/imbox/pull/78
Yes I was receiving an email in UTF-8. The email was send by another server (woocommerce).
I have the same problem.
In my e-mail it says charset=utf-8
, while there are actually latin-1
characters in it.
Example:
b'\xe4\xf6\xfc\xc4\xd6\xdc\xdf'
that should translate to this:
'äöüÄÖÜß'
Imbox reads from the raw body the charset=utf-8
info and uses this to decode the text, which leads to loss of the latin-1 characters.
As a hack, I changed line 129 in parser.py to following code:
latinchars = [b'\xe4', b'\xf6', b'\xfc', b'\xc4', b'\xd6', b'\xdc', b'\xdf']
if any(s in content for s in latinchars):
charset='latin-1'
else:
charset = message.get_content_charset('utf-8')
Other characters can be found here or in python with 'ä'.encode('latin-1')
Edit:
To just set message.get_payload(decode=False)
will lead to problems if the e-mail is actually encoded with utf-8
Another edit:
At my computer, Thunderbird sends latin-1
characters while setting the charset=utf-8
.
In parser.py, line 125
content = message.get_payload(decode=True)
is removing some special characters like ç or é or... It works fine withmessage.get_payload(decode=False)
like this :