jstedfast / MimeKit

A .NET MIME creation and parser library with support for S/MIME, PGP, DKIM, TNEF and Unix mbox spools.
http://www.mimekit.net
MIT License
1.82k stars 369 forks source link

From returns blank when it contains special characters #1043

Closed lxjingGZ closed 4 months ago

lxjingGZ commented 4 months ago

I am using the latest version, and recently encountered an issue. When the From field contains special characters, it causes the From value to return as blank. Here is an example that demonstrates the issue:


Received: from xxx by fast.ezcone.com with local (Exim 4.69)
    (envelope-from <xxxx@fast.ezcone.com>)
    id 1PIMUx-0002xT-2t
    for support@xxxxx.com; Tue, 16 Nov 2012 08:27:03 -0600
From: XXXX Hunter <webmaster\@xxxxxx.net@fast.ezcone.com>
jstedfast commented 4 months ago

Which special characters. specifically? Are you talking about the \ before the @? Or are you talking about whatever characters the XXXX replaced?

mirror222 commented 4 months ago

Which special characters. specifically? Are you talking about the \ before the @? Or are you talking about whatever characters the XXXX replaced?

Yes, just as you said, I also encountered this problem. Thank you.

jstedfast commented 4 months ago

Okay, the problem is that \ is not a valid atom character. It can only appear in quotes. Authors of these email programs really need to start reading and following the specifications rather than just making up syntax out of thin air ☹️

Syntax from RFC5322:

   addr-spec       =   local-part "@" domain

   local-part      =   dot-atom / quoted-string / obs-local-part

   atext           =   ALPHA / DIGIT /    ; Printable US-ASCII
                       "!" / "#" /        ;  characters not including
                       "$" / "%" /        ;  specials.  Used for atoms.
                       "&" / "'" /
                       "*" / "+" /
                       "-" / "/" /
                       "=" / "?" /
                       "^" / "_" /
                       "`" / "{" /
                       "|" / "}" /
                       "~"

   atom            =   [CFWS] 1*atext [CFWS]

   dot-atom-text   =   1*atext *("." 1*atext)

   dot-atom        =   [CFWS] dot-atom-text [CFWS]

   specials        =   "(" / ")" /        ; Special characters that do
                       "<" / ">" /        ;  not appear in atext
                       "[" / "]" /
                       ":" / ";" /
                       "@" / "\" /
                       "," / "." /
                       DQUOTE

As you can see in the syntax definitions above, a local-part token that matches the dot-atom syntax is explicitly disallowed to contain the \ character.

That said, MailKit already supports @ in the local-part as long as it's not the first character.

jstedfast commented 4 months ago

I'm not sure how these addresses are supposed to be encoded. I'm pretty sure I've typically seen them in the form webmaster%40custom-host.com@mail-host.com.

Need to do more research on this...

mirror222 commented 4 months ago

Okay, the problem is that \ is not a valid atom character. It can only appear in quotes. Authors of these email programs really need to start reading and following the specifications rather than just making up syntax out of thin air ☹️

Syntax from RFC5322:

   addr-spec       =   local-part "@" domain

   local-part      =   dot-atom / quoted-string / obs-local-part

   atext           =   ALPHA / DIGIT /    ; Printable US-ASCII
                       "!" / "#" /        ;  characters not including
                       "$" / "%" /        ;  specials.  Used for atoms.
                       "&" / "'" /
                       "*" / "+" /
                       "-" / "/" /
                       "=" / "?" /
                       "^" / "_" /
                       "`" / "{" /
                       "|" / "}" /
                       "~"

   atom            =   [CFWS] 1*atext [CFWS]

   dot-atom-text   =   1*atext *("." 1*atext)

   dot-atom        =   [CFWS] dot-atom-text [CFWS]

   specials        =   "(" / ")" /        ; Special characters that do
                       "<" / ">" /        ;  not appear in atext
                       "[" / "]" /
                       ":" / ";" /
                       "@" / "\" /
                       "," / "." /
                       DQUOTE

As you can see in the syntax definitions above, a local-part token that matches the dot-atom syntax is explicitly disallowed to contain the \ character.

That said, MailKit already supports @ in the local-part as long as it's not the first character.

you are so right, they can't make up syntax out of thin air ☹️ lol

jstedfast commented 4 months ago

Okay, so I've added support for addresses like webmaster\@custom-host.com@mail-host.com

If the address parser encounter a \@ sequence, it will convert that to %40 when the FormatOptions.AddressParserComplianceMode value is Looser.

This solution isn't ideal, but is probably the simplest option that we can do for invalid local-parts like this.

The other option would be to quote the local-part, but that would be a much more involved fix because obs-local-part allows a mix of qstring and atom tokens separated by . in a local-part (the modern form only allows a single qstring -or- multiple atoms separated by .s which would be much simpler).

Because of obs-local-part, we can't just wrap the token with DQUOTEs when we finish consuming the local-part, because it could include 1 or more qstrings that we would have to escape. We also can't just quote individual atom tokens containing the \@ sequence, because then the local-part from the example address above would end up being:

"webmaster@custom-host".com

Even though that would be syntactically valid, it's not likely to be interpreted the same. Ideally, if we were to implement a solution that quoted the relevant parts of the local-part token, it would look like this:

"webmaster@custom-host.com"

This is doable, but not without significant rewriting of the current TryParseLocalPart method logic.

That said, even that might not get interpreted the as the same mailbox by whatever mail software generated the \@ sequence in the first place.

(Obviously, the same goes for this %40 hack.)

There may not even be a universally correct interpretation of this style of address. In other words, some mail software might accept the %40 encoding and deliver the message to the correct mailbox while others will only accept a quoted local-part or only accept \@ whereas others might accept some combination but not all.

I wish I had more information about what software generated that address and which servers would accept what.