hassanakbar4 / mailarchive-tickets

0 stars 0 forks source link

UnicodeEncodeError #304

Closed hassanakbar4 closed 2 years ago

hassanakbar4 commented 3 years ago

component_MailArchive: User Interface resolution_fixed type_defect | by rcross@amsl.com


UnicodeEncodeError at /arch/msg/mobopts/_E4pLFG7UfowwIQHjHAy437bFIE/ 'utf-8' codec can't encode character '\udc87' in position 16743: surrogates not allowed

Early investigation points to something in generator.py that is causing a problem. The raw message itself does not contain invalid codes.


Issue migrated from trac:3086 at 2021-09-22 16:57:05 +0500

hassanakbar4 commented 3 years ago

@hassanakbar4 commented


Another example:

UnicodeEncodeError at /arch/msg/mipshop/yB-c5urm011Bn3rVw-xWK-Ivc5I/ 'utf-8' codec can't encode character '\udc87' in position 17354: surrogates not allowed

hassanakbar4 commented 3 years ago

@hassanakbar4 changed status from new to closed

hassanakbar4 commented 3 years ago

@hassanakbar4 changed resolution from ` tofixed`

hassanakbar4 commented 3 years ago

@hassanakbar4 commented


Both examples are messages that have UTF-7 encoding

Content-Type: text/plain;charset="utf-7"\ Content-Transfer-Encoding: 7bit

This is particularly rare in the archive. Only 47 such messages. 2 of which resulted in encoding errors, from a signature line in a copied message that either contained improperly encoded data or data the the Python standard library does not handle properly.