pazz / alot

Terminal-based Mail User Agent
GNU General Public License v3.0
683 stars 163 forks source link

please make `get_body_text` more robust #1601

Open josch opened 2 years ago

josch commented 2 years ago

Hi,

especially when encountering malformed spam email, alot keeps quitting on me with tracebacks like this:

  File "/usr/share/alot/alot/widgets/search.py", line 187, in <genexpr>
    lastcontent = ' '.join(m.get_body_text() for m in msgs)
  File "/usr/share/alot/alot/db/message.py", line 287, in get_body_text
    return extract_body_part(self.get_mime_part())
  File "/usr/share/alot/alot/db/utils.py", line 497, in extract_body_part
    rendered_payload = render_part(
  File "/usr/share/alot/alot/db/utils.py", line 345, in render_part
    raw_payload = remove_cte(part)
  File "/usr/share/alot/alot/db/utils.py", line 440, in remove_cte
    bp = base64.b64decode(payload)
  File "/usr/lib/python3.9/base64.py", line 87, in b64decode
    return binascii.a2b_base64(s)
binascii.Error: Incorrect padding

or

  File "/usr/share/alot/alot/widgets/search.py", line 187, in <genexpr>
    lastcontent = ' '.join(m.get_body_text() for m in msgs)
  File "/usr/share/alot/alot/db/message.py", line 287, in get_body_text
    return extract_body_part(self.get_mime_part())
  File "/usr/share/alot/alot/db/utils.py", line 497, in extract_body_part
    rendered_payload = render_part(
  File "/usr/share/alot/alot/db/utils.py", line 345, in render_part
    raw_payload = remove_cte(part)
  File "/usr/share/alot/alot/db/utils.py", line 436, in remove_cte
    bp = quopri.decodestring(payload.encode('ascii'))
UnicodeEncodeError: 'ascii' codec can't encode characters in position 8114-8123: ordinal not in range(128)

I'm currently running alot with the following patch:

--- a/alot/db/message.py    2022-04-21 14:03:34.085067550 +0200
+++ b/alot/db/message.py    2022-04-21 12:17:26.415798127 +0200
@@ -284,7 +284,10 @@

     def get_body_text(self):
         """ returns bodystring extracted from this mail """
-        return extract_body_part(self.get_mime_part())
+        try:
+            return extract_body_part(self.get_mime_part())
+        except:
+            return "ERROR"

     def matches(self, querystring):
         """tests if this messages is in the resultset for `querystring`"""

This replaces the message body by ERROR which is fine because those messages are spam anyways and at least alot doesn't quit. If a messages makes alot quit, it's quite time consuming to find that one spam message that tripped it off. With this patch such messages can be quickly identified and marked as spam. Certainly something more descriptive than ERROR should be returned, maybe even a traceback that helps identifying the problem?

kbingham commented 2 years ago

I think I've hit the same issue here too - and probably have something equally bad as a temporary fix.

Even if it's spam though - it would be better to prepare and display the text as much as possible, so I think it needs something more, perhaps within extract_body_part() ? (or get_mime_part() ?)