mnako / letters

Letters, or how to parse emails in Go
MIT License
46 stars 9 forks source link

cannot parse To header, Panic when header contains a=40a #67

Open ltybenet opened 10 months ago

ltybenet commented 10 months ago

Email header: To: =?utf-8?Q?a=40a?= <test@example.com> code:

email, err := letters.ParseEmail(r)
if err != nil {
    log.Fatal(err)
}

Panic when header contains a=40a output letters.ParseEmail: cannot parse headers: letters.parsers.parseHeaders: cannot parse To header: letters.parsers.parseAddressListHeader: cannot parse address list header "=?utf-8?Q?a=40a?= <test@example.com>": mail: expected comma

mime decode test:

name := "=?utf-8?Q?=40?="
header, err := new(mime.WordDecoder).DecodeHeader(name)
if err != nil {
    panic(err)
}
fmt.Println(header)
// output: a@a
mnako commented 10 months ago

Hi @ltybenet,

Thank you for opening the issue.

You are correct that mime.WordDecoder can decode "=?utf-8?Q?=40?=" properly. However, mail.ParseAddressList() raises the "expected comma" error, which I believe to be the intended behaviour.

After careful review of RFC 5322, I believe that according to a strict interpretation, "@" is not allowed in the display-name part of an email address outside of double quotes. This (strict) interpretation is based on the fact that RFC 5322 does not explicitly mention the allowed characters within a display-name, but it can be inferred from Section 3.4. Address Specification.

Specifically, Section 3.4 defines display-name as a phrase. A phrase is further defined as 1*word, where a word can be either an atom or a quoted-string. An atom, defined in Section 3.2.3, only allows specific characters and does not include "@".

Therefore, I believe that erroring out on a "@" in a display-name without quotes aligns with the strict interpretation of RFC 5322.

When tested with double-quoted versions, such as To: "=?utf-8?Q?a=40a?=" <a@a.com> and To: "=?utf-8?Q?a=40a?=" <a@a.com>, "=?utf-8?Q?b=40b?=" <b@b.com>, mail.ParseAddressList() (and Letters) works as expected.

I am happy to discuss this further if you have a different opinion.

ltybenet commented 10 months ago

To: =?utf-8?Q?a=40a?= <test@example.com> The actual email header is To: =?utf-8?Q?test=40example.com?= <test@example.com> There are many email headers like this. Many email headers in my work are not written according to standards. It is expected that "letters" can ignore and skip these non-standard headers when encountering them, but not that the entire email cannot be parsed! It would be best if "letters" could be compatible with these non-standard headers😁 Or some headers that cannot be parsed will be returned. Developers can use other tools to try to parse these non-standard headers.