Open mlt opened 1 month ago
Hm, my understanding of the newer RFCs 6532 and 6531 is that nowadays mail addresses can contain UTF-8 directly, both in the local part and in the domain part, and such UTF-8 addresses can show up in mail headers directly, without the need for punycode or other encodings.
So msmtp should leave these addresses alone.
The error you linked to is because msmtpd does not want to handle UTF-8 currently. It enforces a very strict ASCII-only subset of characters because mail addresses will become part of a command line that is passed to a shell, so there are security considerations. The clean solution would probably be to not use popen() but fork/execve or something similar, but that quickly becomes a great big mess with a lot of potential for bugs, so I'm not sure what to do about it.
I personally do not use msmtpd so let it be. It was just handy to use for testing.
RFC 6531 also says in section 3.4 that the client MUST supply SMTPUTF8 in MAIL FROM if and only if it is necessary. I did a quick try with a couple of servers. chasquid (first time saw it) seems to not care about that out of the box. However, Exim (with smtputf8_advertise_hosts = *
) did not like it (could be overridden with explicit allow_utf8_domains = true
though)
<-- 220 DESKTOP-K26J5U0. ESMTP Exim 4.97 Ubuntu Mon, 28 Oct 2024 10:35:49 -0500
--> EHLO localhost
<-- 250-DESKTOP-K26J5U0. Hello ip6-localhost [::1]
<-- 250-SIZE 52428800
<-- 250-8BITMIME
<-- 250-PIPELINING
<-- 250-PIPECONNECT
<-- 250-CHUNKING
<-- 250-STARTTLS
<-- 250-PRDR
<-- 250-SMTPUTF8
<-- 250 HELP
--> MAIL FROM:<mlt@почта.test>
--> RCPT TO:<postmaster@почта.test>
--> DATA
<-- 501 <mlt@почта.test>: domain missing or malformed
but accepts if MAIL FROM includes SMTPUTF8
<-- 220 DESKTOP-K26J5U0. ESMTP Exim 4.97 Ubuntu Mon, 28 Oct 2024 10:36:30 -0500
--> EHLO localhost
<-- 250-DESKTOP-K26J5U0. Hello ip6-localhost [::1]
<-- 250-SIZE 52428800
<-- 250-8BITMIME
<-- 250-PIPELINING
<-- 250-PIPECONNECT
<-- 250-CHUNKING
<-- 250-STARTTLS
<-- 250-PRDR
<-- 250-SMTPUTF8
<-- 250 HELP
--> MAIL FROM:<mlt@почта.test> SMTPUTF8
--> RCPT TO:<postmaster@почта.test>
--> DATA
<-- 250 OK
<-- 250 Accepted
<-- 354 Enter message, ending with "." on a line by itself
--> Date: Mon, 28 Oct 2024 10:36:27 -0500
--> Message-ID: <09aa28c82db6a0d12e63f5c541391b33@почта.test>
--> From: mlt@почта.test
--> To: postmaster@почта.test
--> Subject: Hello
-->
--> Have a nice day!
-->
--> .
<-- 250 OK id=1t5RnA-000000005rz-41rK
Also in the middle of section 3.2 it says the client should not attempt transmission if the server does not support the extension but it is necessary… and that the client may do something depending on circumstances. I think to fully comply, ideally, msmtp should 1) recognize server's SMTPUTF8 capability 2) indicate the need in MAIL FROM 3) do our best if server is not capable of SMTPUTF8 by using punycode if local part happen to be ascii only
While 1 and 3 are easy, for 2, however, I feel like it would be nice to have some "context" struct to pass around to reduce number of arguments passed around in a few functions. It seems convenient to pre-scan recipients' characters where those are sanitized so by the time capabilities are advertised, we don't have to go over the list of recipients again.
This SMTPUTF8 thing is a mess.
Msmtp cannot and should not attempt to find out if the mail requires SMTPUTF8, since that would require parsing all relevant mail headers. And that is so error prone that msmtp was specifically designed never to do something like that.
Instead, what we could do is detect the SMTPUTF8 server capability and if it is present, always send the SMTPUTF8 parameter. Strictly speaking that does not even violate Sec. 3.4 since msmtp is not aware whether SMTPUTF8 is needed or not.
I would not care about ancient servers not supporting SMTPUTF8 beyond not sending the SMTPUTF8 parameter. If such a server rejects a client message because there is some address and/or some header with UTF8 in it, then the user will be notified, and I think that's good enough.
That's the minimal future proof approach, and since msmtp strives for minimality, I think it is sufficient.
As a side note, we do not check whether we actually transmit in UTF8 and not something else.
We got to use punycode at least for domain part. I saw a bunch of RFCs talking about local part and downgrading as well as a proposed use of SMTPUTF8 extension on both sides. What is your take on that?