jstedfast / gmime

A C/C++ MIME creation and parser library with support for S/MIME, PGP, and Unix mbox spools.
GNU Lesser General Public License v2.1
111 stars 36 forks source link

v3.2.13 changes the behavior of `g_mime_message_get_addresses` when more than one `Cc` header exists. #129

Closed dkg closed 1 year ago

dkg commented 1 year ago

In the notmuch test suite, we're seeing a failure to reply correctly to a message that has multiple cc headers. From a git bisect, i found that 4a80ae527df9aa36fe50fac0878207a31d4d6b72 introduces this failure:

 FAIL   Reply to a message with multiple Cc headers
    --- T220-reply.17.expected  2022-10-04 02:36:37.774311603 +0000
    +++ T220-reply.17.output    2022-10-04 02:36:37.774311603 +0000
    @@ -1,7 +1,7 @@
     From: Notmuch Test Suite <test_suite@notmuchmail.org>
     Subject: Re: wowsers!
     To: Alice <alice@example.org>, Daniel <daniel@example.org>
    -Cc: Bob <bob@example.org>, Charles <charles@example.org>
    +Cc: Charles <charles@example.org>
     In-Reply-To: <multiple-cc@example.org>
     References: <multiple-cc@example.org>

The message that is being replied to is this simple, subtly-malformed message:

From: Alice <alice@example.org>
To: Daniel <daniel@example.org>
Cc: Bob <bob@example.org>
Subject: wowsers!
cc: Charles <charles@example.org>
Message-Id: <multiple-cc@example.org>
Date: Thu, 16 Jun 2016 22:14:41 -0400

Note the Cc: and cc: headers.

I note that RFC 5322 §3.6 indicates that there must be exactly 0 or 1 Cc header, but in practice, notmuch has been working off the assumption that earlier versions of gmime had, which is that multiple Cc headers could or would be collapsed and treated as a single header.

dkg commented 1 year ago

So i see two ways that we can deal with this:

I'm not sure what the correct approach is. I don't know how many messages in existence carry this particular quirky violation of the standard. And i don't know how else to interpret such a message -- i mean, which Cc: should the implementation pick if it's faced with a message with multiple copies?

dkg commented 1 year ago

Attached are two simple test files:

if you build demonstrate-129 and run it as:

./demonstrate-129 < broken-cc.eml

Then you'll see that with versions before 3.2.13 the result is to show both Bob and Charles, but as of 3.2.13 (or really, as of 4a80ae527df9aa36fe50fac0878207a31d4d6b72) the result is to show only Charles.

dkg commented 1 year ago

Hm, @ojwb points out on the #notmuch IRC channel that older versions of the specification were more flexible than RFC 5322.

RFC 5322 §4 ("Obsolete Syntax") acknowledges as much:

Earlier versions of this specification allowed for different (usually more liberal) syntax than is allowed in this version. Also, there have been syntactic elements used in messages on the Internet whose interpretations have never been documented. Though these syntactic forms MUST NOT be generated according to the grammar in section 3, they MUST be accepted and parsed by a conformant receiver.

And in §4.5.3 ("Obsolete Destination Address Fields"), we see:

When multiple occurrences of destination address fields occur in a message, they SHOULD be treated as if the address list in the first occurrence of the field is combined with the address lists of the subsequent occurrences by adding a comma and concatenating.

So as a parser, i think gmime 3.2.13 is doing the wrong thing here, and it should be fixed.

jstedfast commented 1 year ago

Oops, yes, your patch looks correct. My intention was always to have GMime combine all of the Cc: addresses into 1 InternetAddressList for convenient usage.