martinrusev / imbox

Python IMAP for Human beings
MIT License
1.18k stars 190 forks source link

LookupError: unknown encoding: utf-8 euao6y69zh2w #169

Open wesinator opened 5 years ago

wesinator commented 5 years ago

Email: imbox_lookuperror_unknown_encoding.eml.txt

Traceback (most recent call last):
  File "/home/ubuntu/.local/lib/python3.6/site-packages/imbox/parser.py", line 128, in decode_content
    return content.decode(charset, 'ignore')
LookupError: unknown encoding: utf-8  euao6y69zh2w

During handling of the above exception, another exception occurred:

    for uid, msg in unread_junk:
  File "/home/ubuntu/.local/lib/python3.6/site-packages/imbox/messages.py", line 39, in _fetch_email_list
    yield uid, self._fetch_email(uid)
  File "/home/ubuntu/.local/lib/python3.6/site-packages/imbox/messages.py", line 28, in _fetch_email
    parser_policy=self.parser_policy)
  File "/home/ubuntu/.local/lib/python3.6/site-packages/imbox/parser.py", line 141, in fetch_email_by_uid
    email_object = parse_email(raw_email, policy=parser_policy)
  File "/home/ubuntu/.local/lib/python3.6/site-packages/imbox/parser.py", line 187, in parse_email
    content = decode_content(part)
  File "/home/ubuntu/.local/lib/python3.6/site-packages/imbox/parser.py", line 130, in decode_content
    return content.decode(charset.replace("-", ""), 'ignore')
LookupError: unknown encoding: utf8  euao6y69zh2w
py-radicz commented 4 years ago

yea, I have got same exception, and its sad that the raised Exception breaks the generator, so it cannot be properly handled.

What about just returning raw msg in such cases, so generator can be consumed to exhaustion

py-radicz commented 4 years ago

or maybe this also fixes broader range of cases

def decode_content(message):
    content = message.get_payload(decode=True)
    charset = message.get_content_charset('utf-8')
    try:
        return content.decode(charset, 'ignore')
    except LookupError:
        import chardet
        encoding = chardet.detect(content).get('encoding')
        if encoding:
            return content.decode(encoding, 'ignore')
         return content
    except AttributeError:
        return content
martinrusev commented 4 years ago

Related: https://github.com/martinrusev/imbox/issues/177 @py-radicz Can you maybe submit a PR request and I will merge it into master. My only issue with the code above is to move the import chardet bit to the top of the file

py-radicz commented 4 years ago

thats great idea to practice the PR since I have never done that. Will do.

taewookim commented 3 years ago

Getting the same error on "windows-874" encoding w/latest pip install of this package Any workaround, @martinrusev ?

I tried @py-radicz 's solution.. no luck either