jstedfast / MimeKit

A .NET MIME creation and parser library with support for S/MIME, PGP, DKIM, TNEF and Unix mbox spools.
http://www.mimekit.net
MIT License
1.84k stars 373 forks source link

MailboxAddress encoding with first or last word containing non-ascii char #1093

Open GuillaumeErgole opened 3 weeks ago

GuillaumeErgole commented 3 weeks ago

Describe the bug When the first or last word of a MailboxAddress.Name contains some non-ascii char, the double-quotes are not at the correct place in encoded result.

Platform:

To Reproduce

using MimeKit;

var address = new MailboxAddress("Département Formation, Recherche et Support", "formation@company.com");

var actual = address.ToString(true);
// actual: "=?utf-8?q?D=C3=A9partement?= \"Formation, Recherche et Support\"\r\n\t<formation@company.com>"
// expected: "\"=?utf-8?q?D=C3=A9partement?= Formation, Recherche et Support\"\r\n\t<formation@company.com>"

var address2 = new MailboxAddress("Département Formation, Recherche et Développement", "formation@company.com");

var actual2 = address2.ToString(true);
// actual2: "=?utf-8?q?D=C3=A9partement?= \"Formation, Recherche et\"\r\n =?utf-8?q?D=C3=A9veloppement?= <formation@company.com>"
// expected2: "\"=?utf-8?q?D=C3=A9partement?= Formation, Recherche et\r\n =?utf-8?q?D=C3=A9veloppement?=\" <formation@company.com>"

Expected behavior With the actual behavior Outlook display : image

Double quotes shouldn't be here.

jstedfast commented 3 weeks ago

Your expected values are syntactically wrong. Anything in quotes is not supposed to get decoded according to rfc2047.

This is actually a bug in Outlook because the qstring token should be getting unquoted according to the rules of rfc0822/2822/5322/etc.

Keep in mind that the quoted/encoded names in the raw headers are not meant for display purposes. Quoting in the raw header values is just a way to prevent special tokens (such as the comma (',') character in your example) from being interpreted as an address separator.

Arguably MimeKit could (should?) work-around this Outlook bug by just encoding the entire name, but that would make the name unreadable by clients that do not support decoding rfc2047 tokens (in fairness, this was more of a concern in the late 90's when the spec was written than it is now because all clients should support decoding by now).

https://www.rfc-editor.org/rfc/rfc2047

To prove my point, you can test this assertion using the following code snippet:

var mailbox = new MailboxAddress ("Département Formation, Recherche et Support", "formation@company.com");
var encoded = mailbox.ToString (true);
var decoded = MailboxAddress.Parse (encoded);

// Verify that the decoded name exactly matches the original name
Assert(decoded.Name == mailbox.Name);
GuillaumeErgole commented 3 weeks ago

Encoding the entire name can be a FormatOptions so as not to modify the default behavior for those who must respect the rfc2047.

jstedfast commented 3 weeks ago

Yes, that is one possibility. I am considering the various options. I am pretty busy this week and next week with my day job so I don't expect to have a solution immediately.