mailgun / flanker

Python email address and Mime parsing library
http://www.mailgun.com
Apache License 2.0
1.63k stars 204 forks source link

Use charset in case of header decoding #203

Closed veulkehc closed 6 years ago

veulkehc commented 6 years ago

If a header was encoded it can be represented as multiple 'encoded-word's. For instance string Тест длинного дисплей нейма is encoded like =?utf-8?b?0KLQtdGB0YIg0LTQu9C40L3QvdC+0LPQviDQtNC40YHQv9C70LXQuSDQvdC10LnQ?=\n =?utf-8?b?vNCw?=. So in this example the ending of the first encoded-word in the middle of one UTF8 character. The method mime_to_unicode treated two words separately so the decoding result was Тест длинного дисплей ней�МаА. The fix make a concatenation of 'encoded-word's with the same charset and convert only the whole string to unicode.

veulkehc commented 6 years ago

@horkhe Yes, when flanker encodes headers it doesn't comply with RFC but email providers works fine with such headers. The fix doesn't care if a header is encoded by RFC or not. It just does the best to decode encoding text. Of course encoding issue has to be fixed as well.

horkhe commented 6 years ago

@veulkehc it would be nice to update CHANGELOG.md.