jnphilipp / gpgmail

Encrypt/Decrypt GPG/MIME emails.
GNU General Public License v3.0
14 stars 5 forks source link

handling Content-Transfer-Encoding: "quoted-printable" #6

Closed networkjanitor closed 2 years ago

networkjanitor commented 2 years ago

as per https://github.com/infertux/zeyple/issues/39, gpgmail probably suffers the same issue since it doesn't use get_payload(decode=True) in https://github.com/jnphilipp/gpgmail/blob/master/gpgmail#L216 and following, or?

Additionally gpgmail definitely is missing the deletion of the header Content-Transfer-Encoding from the unencrypted part of the encrypted message. Should `Content-Transfer-Encoding: quoted-printable" be set, then Thunderbird fails to decrypt the message.

Example, if this is encrypted by gpgmail, thunderbird cannot decrypt it:

Return-Path: <noreply@booking.com>
Delivered-To: mail@example.com
Content-Transfer-Encoding: quoted-printable
Content-Type: text/html; charset=UTF-8
Date: Wed, 20 Feb 2019 23:23:04 +0000 (UTC)
From: "Booking.com" <noreply@booking.com>
Mime-Version: 1.0
Reply-to: noreply@booking.com
To: mail@example.com
Message-ID: <foobar@ismtpd0004p1lon1.sendgrid.net>
Subject: Example Subject

<!DOCTYPE html>
<html lang=3D"en" style=3D"">
<p>foo</p>
</html>

Thunderbird errors:

[process_pgp_source() /build/thunderbird/src/thunderbird-91.9.1/comm/third_party/rnp/src/librepgp/stream-parse.cpp:2534] not an OpenPGP data provided
console.debug: "rnp_op_verify_execute returned unexpected: 268435457"

Zeyple solves this by deleting this header: https://github.com/infertux/zeyple/blob/master/zeyple/zeyple.py#L238 - is there any reason to not do the same in gpgmail?

jnphilipp commented 2 years ago

I mainly use Evolution and there works your example. And if you look at the lines L126-L128 and L152-L155, I'm coping the Content-Transfer-Encoding header explicitly. I had the problem that some non ASCII symbols weren't handled correctly, that's why it is the way it is. But I'll look into fixing it.

networkjanitor commented 2 years ago

Yeah, saw L152-L155 and removed it when testing, which worked for my tests (and the automated tests).

Did you have problems with the non ASCII symbols in the subject or content of your mails? Maybe this is another Evolution-Thunderbird difference in how Content-Transfer-Encoding is handled?

jnphilipp commented 2 years ago

Interestingly mail.get_payload(decode=True) is only important when Content-Transfer-Encoding is quoted-printable, base64. And then gets converted to base64. I'm going to test this version a bit, to see if I ran into any problems non ASCII symbols.

networkjanitor commented 2 years ago

In regards to the current content-transfer-encoding branch: https://github.com/jnphilipp/gpgmail/blob/666282a94ed9fb53646738697b39ca69359d5923/gpgmail#L209-L214

mail["Content-Transfer-Encoding"] is None if it is not present in mail. Should the mail body contain non-ascii characters (like Umlaute), then L209 will return a base64 encoded mail body with Content-Transfer-Encoding: base64.

L210 evaluates to True, so in L211 None is set as an additonal Content-Transfer-Encoding header:

Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 
Content-Transfer-Encoding: base64

VGVzdCDvv73vv73vv70KCg==

Thunderbird at least can't handle this after decrypting and just displays the base64-encoded string (have not tested Evolution yet).

Example mail:

From: <mail@sender.com>
To: <mail@example.com>
Subject: Example Subject
Date: Thu, 27 Jun 2019 09:42:57 +0200
Content-Type: text/plain; charset="UTF-8"
MIME-Version: 1.0

Test äöü
networkjanitor commented 2 years ago

Based on my tests, the following snippet works without issues. Imported a random couple hundred of old mail and checked them for encoding issues, but did not find anything wrong with them (thunderbird).

        if mail.is_multipart():
            for payload in mail.get_payload():
                orig_msg.attach(payload)
        else:
            if mail["Content-Transfer-Encoding"] in ["quoted-printable", "base64"]:
                orig_msg.set_payload(mail.get_payload(decode=True))
            else:
                orig_msg.set_payload(mail.get_payload(decode=False))
            if mail["Content-Transfer-Encoding"] not in ["quoted-printable", "base64", None]:
                orig_msg["Content-Transfer-Encoding"] = mail["Content-Transfer-Encoding"]
            del mail["Content-Transfer-Encoding"]

edit: I've found a single message which cannot be decoded by thunderbird again, but I have not investigated it yet. It's a multipart message with Content-Transfer-Encoding: base64 incl. an .ics calendar attachment.

jnphilipp commented 2 years ago

I see what you mean. The question I guess is, whether a mail without non ASCII and no Contisent-Transfer-Encoding is properly formed or not. Or if this is a "bug" in how Python handles these mails. I shorten your code a bit and added a test with a multipart message with an .ics that works for me.

networkjanitor commented 2 years ago

Just a quick update: found some more messages that can't be decrypted. I think the reason is, that so far the Content-Transfer-Encoding header is only deleted for non-multipart messages. Should a message be multipart and have Content-Transfer-Encoding: base64 or Content-Transfer-Encoding: quoted-printable, then it's not decryptable by thunderbird (in hindsight this seems rather obvious, given the original problem).

The fix is probably to just move https://github.com/jnphilipp/gpgmail/blob/f484588051a37647571daad8024c41b8e934e98e/gpgmail#L212-L218 an indent up and outside the if mail.is_multipart(): block, but I have not tested it so far.

jnphilipp commented 2 years ago

Yeah, apparently only the actually message part is allowed to have a Content-Transfer-Encoding other than 7bit, 8bit, binary, see.

networkjanitor commented 2 years ago

Looks good to me and appears to solve all mentioned issues :+1: