pazz / alot

Terminal-based Mail User Agent
GNU General Public License v3.0
683 stars 163 forks source link

Content-Transfer-Encoding badly interpreted when charset in Content-Type is between quote #1522

Open guijemont opened 4 years ago

guijemont commented 4 years ago

Before you submit a bug report, please make sure that the issue still exists on the master branch!

Describe the bug I have a bunch of emails with headers that contain:

Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable 

Though in all the examples I could find, several headers were in-between these two. For all these emails, non-ascii characters appear incorrectly. Looking at the debug log, I see for these messages that the Content-Transfer-Enconding is misinterpreted:

DEBUG:utils:Content-Transfer-Encoding: "8bit"
DEBUG:utils:assuming Content-Transfer-Encoding: 8bit
DEBUG:utils:command: more /tmp/pb54ils2
DEBUG:utils:parms: ('text/plain=', 'charset=UTF-8')

Worthy of notes, I have found another quoted-printable email that is correctly displayed, with the headers looking like:

Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable

Note that unlike in the previous case, the charset is not between quotes.

In that case, non-ascii characters are displayed properly, and the debug log yields:

DEBUG:utils:Content-Transfer-Encoding: "quoted-printable"
DEBUG:utils:assuming Content-Transfer-Encoding: quoted-printable

Software Versions

To Reproduce Steps to reproduce the behaviour:

  1. open a correctly formed email, with headers set as above, that contains non-ascii characters (ones that would be transformed by quoted-printable
  2. look at accentuated characters, see that they are malformed

Error Log See description of the bug.

pazz commented 4 years ago

Thanks for reporting this. Would you mind sending a few problematic anonymized mails our way? Ideally in the form of a PR that adds (failing) unit tests. Email etiquette is unfortunately not very rigurous when it ocmes to encoding issues and fidly header syntax. Alot mostly uses pythons email module in order to keep standard compliant, but of course, lots of malformed mails make the rounds..

guijemont commented 4 years ago

Would you mind sending a few problematic anonymized mails our way? Ideally in the form of a PR that adds (failing) unit tests.

I'll see if I find time to do that over the week-end.

guijemont commented 4 years ago

Ok, I did some more trying with things, and it turns out that my minimal test is already in the tree: it is https://github.com/pazz/alot/blob/master/tests/static/mail/utf8.eml Here are a few screencaps of how it looks for me after notmuch insert:

Default view: image

With togglesource (note how alot seems to have transformed the email to quoted-printable, though the original is not) image with toggleheaders (here the headers match the original): image

Finally, this is how notmuch sees it:

$ notmuch show --format=raw tag:utf8test
From: lucc@github
To: tests@alot
Subject: plain utf8 8bit message
MIME-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 8bit

Liebe Grüße!
guijemont commented 4 years ago

A couple additional notes on tests:

That leads me to think that either:

jonassmedegaard commented 4 years ago

I experience what seems like same issue:

Using alot 0.9 (or more accurately 0.9-2 from Debian unstable) emails behaved correctly, but with 0.9.1 (or git snapshot of 0.9.1 from Debian packaging git) I get tofu characters for non-ascii characters.

Example header which fails to render correctly contains this:

Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="iso-8859-1"; Format="flowed"

So this issue seems not a broken configuration locally for @guijemont and happens also when source email is not UTF-8 encoded.

jonassmedegaard commented 4 years ago

...and for me it also makes the issue go away to revert https://github.com/pazz/alot/commit/b1c93c4d0c1eeacd64a195f16861bcb73910e739

pazz commented 4 years ago

@ryneeverett can you comment on this?

ryneeverett commented 4 years ago

If I remove the mock decorator from test_simple_utf8_file(), the test fails, which matches my experience in alot. I do not have a ~/.mailcap and /etc/mailcap is unmodified from ubuntu 20.04

I cannot reproduce this and wonder if there isn't a mailcap elsewhere on your system. See https://docs.python.org/3/library/mailcap.html#mailcap.getcaps. Can you confirm the following @guijemont?

$ python
>>> import mailcap
>>> mailcap.getcaps()
{}
>>>
ryneeverett commented 4 years ago

See #1526. This fixes the issues I see with utf8 and a maiilcap entry but I'm not convinced this is the same issue you're facing.

Further review of the code further convinces me that you have some entry for text/plain in a mailcap, because b1c93c4 should not have change the behavior at all if you do not.

pazz commented 4 years ago

I can confirm that I see the same issue as @guijemont reports, displaying the message in tests/mail/utf8.eml in alot. Also, #1526 fixes that issue for me, yes.

I do not have text/plain in my user mailcap but in the (debian testing) system-wide mailcap there are quite a few such entries:

 grep "^text/plain" /etc/mailcap 
text/plain; less '%s'; needsterminal
text/plain; more %s; needsterminal
text/plain; env ATOM_DISABLE_SHELLING_OUT_FOR_ENVIRONMENT=false /usr/bin/atom %s; test=test -n "$DISPLAY"
text/plain; /usr/share/code/code --no-sandbox --new-window %s; test=test -n "$DISPLAY"
text/plain; /usr/bin/emacs -nw %s; needsterminal
text/plain; /usr/bin/emacs %s; test=test -n "$DISPLAY"
text/plain; gvim -f %s; test=test -n "$DISPLAY"
text/plain; nvim %s; needsterminal
text/plain; okular %s; test=test -n "$DISPLAY"
text/plain; gedit --new-document %s; test=test -n "$DISPLAY"
text/plain; vim %s; needsterminal
text/plain; view %s; edit=vim %s; compose=vim %s; test=test -x /usr/bin/vim; needsterminal
text/plain; gview -f %s; edit=gvim -f %s; compose=gvim -f %s; test=test "$DISPLAY" != ""
text/plain; view %s; edit=vi %s; compose=vi %s; needsterminal

accordingly, mailcap.getcaps() is not empty.

guijemont commented 4 years ago

FWIW, I have the following text/plain lines in my /etc/mailcap (unmodified by me, fresh ubuntu docker image):

text/plain; more %s; needsterminal
text/plain; vim %s; needsterminal
text/plain; view %s; edit=vim %s; compose=vim %s; test=test -x /usr/bin/vim; needsterminal
text/plain; view %s; edit=vi %s; compose=vi %s; needsterminal

Will try #1526.

Also, I am a bit confused, not knowing the usual workflow on this project: why is this issue closed if the pull request is not merged yet?

pazz commented 4 years ago

Closing this was an accident sorry

pazz commented 4 years ago

I have an email that may or may not be related:

From: pazz@github
To: tests@alot
Subject: iso-8859-1 quoted-printable
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0

Viele Gr=FC=DFe!

This one never gets displayed correctly, neither with or without the quotes, and not by notmuch as you show above.. Is this message simply broken?

The strange thing is that notmuch does find it when I search for "grüße":

notmuch show --format=raw  to:tests@alot from:pazz grüße
From: pazz@github
To: tests@alot
Subject: iso-8859-1 quoted-printable
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0

Viele Gr=FC=DFe!
guijemont commented 4 years ago

Just updated to latest master with #1526 merged. It does seem to fix the main issue for me (non-ascii utf-8 characters are displayed correctly), though togglesource still shows me inaccurate information. E.g. for https://github.com/pazz/alot/blob/master/tests/static/mail/utf8.eml it shows me:

lucc@github (Jan 1970)
From: lucc@github
To: tests@alot
Subject: plain utf8 8bit message
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable

Liebe Gr=C3=BC=C3=9Fe!
pazz commented 4 years ago

But this is to be expected: togglesource will result in alot displaying the email's source text verbatim, including not yet decoded quoted-printables.

guijemont commented 4 years ago

But this is to be expected: togglesource will result in alot displaying the email's source text verbatim, including not yet decoded quoted-printables.

In this case, the problem is that what is displayed is precisely not the verbatim source (which is not quoted-printable), which I am pasting here for completeness:

From: lucc@github
To: tests@alot
Subject: plain utf8 8bit message
MIME-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 8bit

Liebe Grüße!

Notice the difference in the Content-Transfer-Encoding and in how the body is encoded.

pazz commented 4 years ago

@guijemont you are right, this is weird. I've dug into it and it seems that the email module changes the source when representing the message as string:

>>> m=ui.current_buffer.get_selected_message()
>>> e=m.get_email()
>>>
>>> # This is what alot shows (see widgets.thread.MessageTree)
>>> str(e)
'From: lucc@github\r\nTo: tests@alot\r\nSubject: plain utf8 8bit message\r\nMIME-Version: 1.0\r\nContent-Type: text/plain; charset="utf-8"\r\nContent-Transfer-Encoding: quoted-printable\r\n\r\nLiebe Gr=C3=BC=C3=9Fe!\r\n'
>>>
>>> # This is the content of the file.
>>> open(m.get_filename()).read()
'From: lucc@github\nTo: tests@alot\nSubject: plain utf8 8bit message\nMIME-Version: 1.0\nContent-Type: text/plain; charset="UTF-8"\nContent-Transfer-Encoding: 8bit\n\nLiebe Grüße!\n'

So I suggest we replace https://github.com/pazz/alot/blob/master/alot/widgets/thread.py#L260 to read the source text from disk instead.