haraka / Haraka

A fast, highly extensible, and event driven SMTP server
https://haraka.github.io
MIT License
5.02k stars 662 forks source link

Use domain name for outbound TLS connections #3275

Closed felixauringer closed 6 months ago

felixauringer commented 7 months ago

Is your feature request related to a problem? Please describe.

When Haraka tries to connect to another host (outbound), it uses the IP for SNI and to validate the certificate.

Describe the solution you'd like

Haraka sends the domain as SNI and also validates the certificate using the domain. I would argue that nowadays, certificates usually contain a domain instead of a static IP.

Describe alternatives you've considered

Having certificates with static IPs is not really an alternative for us. Of course, sending the domain instead of the IP might also be configurable in the config.

Additional context

Maybe, I am just using Haraka wrongly, so here is some information on my setup:

Haraka Haraka.js — Version: 3.0.1
Node v21.6.1
OS Linux example.fauringer.de 6.7.3-arch1-1 #1 SMP PREEMPT_DYNAMIC Thu, 01 Feb 2024 10:30:35 +0000 x86_64 GNU/Linux
openssl OpenSSL 1.1.1w 11 Sep 2023

Here are also some logs (note the node deprecation warning):

[INFO] [-] [core] [outbound] Transaction delivery for domain: example.fauringer.de
[INFO] [-] [core] loading tls.ini
[DEBUG] [4DCF459F-A46F-4010-A8C2-37011FC70164.1.1] [outbound] running send_email hooks
[DEBUG] [4DCF459F-A46F-4010-A8C2-37011FC70164.1.1] [outbound] Sending mail: 1707513574559_1707513574559_0_28_4DtbPH_1_example.fauringer.de
[DEBUG] [4DCF459F-A46F-4010-A8C2-37011FC70164.1.1] [outbound] running get_mx hooks
[DEBUG] [4DCF459F-A46F-4010-A8C2-37011FC70164.1.1] [outbound] running get_mx hook in queue/lmtp plugin
[DEBUG] [4DCF459F-A46F-4010-A8C2-37011FC70164.1.1] [outbound]  hook=get_mx plugin=queue/lmtp function=hook_get_mx params=example.fauringer.de retval=CONT msg=""
[DEBUG] [4DCF459F-A46F-4010-A8C2-37011FC70164.1.1] [outbound] delivering from: example.fauringer.de to: 192.168.200.4:25 (0) (0)
[DEBUG] [75D90002-135C-4EBC-9092-9EB13C1842B4] [core] [outbound] created. host: 192.168.200.4 port: 25
[NOTICE] [CDB9268D-5A4C-4033-84DB-31BDFA292614] [core] connect ip=192.168.200.4 port=39864 local_ip=192.168.200.4 local_port=25
[DEBUG] [CDB9268D-5A4C-4033-84DB-31BDFA292614] [core] running connect_init hooks
[DEBUG] [CDB9268D-5A4C-4033-84DB-31BDFA292614] [core] running connect_init hook in early_talker plugin
[DEBUG] [CDB9268D-5A4C-4033-84DB-31BDFA292614] [core]  hook=connect_init plugin=early_talker function=early_talker params="" retval=CONT msg=""
[DEBUG] [CDB9268D-5A4C-4033-84DB-31BDFA292614] [core] running connect_init_respond
[DEBUG] [CDB9268D-5A4C-4033-84DB-31BDFA292614] [core] running lookup_rdns hooks
[DEBUG] [CDB9268D-5A4C-4033-84DB-31BDFA292614] [core] running connect hooks
[DEBUG] [CDB9268D-5A4C-4033-84DB-31BDFA292614] [core] running ehlo hooks
[DEBUG] [CDB9268D-5A4C-4033-84DB-31BDFA292614] [core] running ehlo hook in helo.checks plugin
[DEBUG] [CDB9268D-5A4C-4033-84DB-31BDFA292614] [core]  hook=ehlo plugin=helo.checks function=proto_mismatch_esmtp params=example.fauringer.de retval=CONT msg=""
[DEBUG] [CDB9268D-5A4C-4033-84DB-31BDFA292614] [core] running ehlo hook in helo.checks plugin
[DEBUG] [CDB9268D-5A4C-4033-84DB-31BDFA292614] [core]  hook=ehlo plugin=helo.checks function=init params=example.fauringer.de retval=CONT msg=""
[DEBUG] [CDB9268D-5A4C-4033-84DB-31BDFA292614] [core] running ehlo hook in helo.checks plugin
[INFO] [CDB9268D-5A4C-4033-84DB-31BDFA292614] [helo.checks] skip:proto_mismatch(private)
[DEBUG] [CDB9268D-5A4C-4033-84DB-31BDFA292614] [core]  hook=ehlo plugin=helo.checks function=emit_log params=example.fauringer.de retval=CONT msg=""
[DEBUG] [CDB9268D-5A4C-4033-84DB-31BDFA292614] [core] running capabilities hooks
[DEBUG] [CDB9268D-5A4C-4033-84DB-31BDFA292614] [core] running capabilities hook in tls plugin
[DEBUG] [CDB9268D-5A4C-4033-84DB-31BDFA292614] [core]  hook=capabilities plugin=tls function=advertise_starttls params="" retval=CONT msg=""
[DEBUG] [4DCF459F-A46F-4010-A8C2-37011FC70164.1.1] [outbound] Trying TLS for domain: example.fauringer.de, host: 192.168.200.4
[DEBUG] [CDB9268D-5A4C-4033-84DB-31BDFA292614] [core] running unrecognized_command hooks
[DEBUG] [CDB9268D-5A4C-4033-84DB-31BDFA292614] [core] running unrecognized_command hook in tls plugin
[DEBUG] [-] [core] Upgrading to TLS
[DEBUG] [-] [core] client TLS upgrade in progress, awaiting secured.
(node:28) [DEP0123] DeprecationWarning: Setting the TLS ServerName to an IP address is not permitted by RFC 6066. This will be ignored in a future version.
[DEBUG] [-] [core] SNI servername: 192.168.200.4
[ERROR] [-] [core] client TLS error: Error [ERR_TLS_CERT_ALTNAME_INVALID]: Hostname/IP does not match certificate's altnames: IP: 192.168.200.4 is not in the cert's list:
[ERROR] [4DCF459F-A46F-4010-A8C2-37011FC70164.1.1] [outbound] Ongoing connection failed to 192.168.200.4:25 : Error [ERR_TLS_CERT_ALTNAME_INVALID]: Hostname/IP does not match certificate's altnames: IP: 192.168.200.4 is not in the cert's list:
[DEBUG] [-] [core] [outbound] release_client: 75D90002-135C-4EBC-9092-9EB13C1842B4 192.168.200.4:25 to undefined
[DEBUG] [-] [core] Temp fail for: Tried all MXs
msimerson commented 7 months ago

Maybe, I am just using Haraka wrongly, so here is some information on my setup:

This makes me think so:

delivering from: example.fauringer.de to: 192.168.200.4:25

Maybe, instead of telling Haraka that the next hop is 192.168.200.4:25, you should tell it the next hop is ${example}.fauringer.de:25. Assuming the DNS resolves (and it will, because it's so critical to SMTP, right?), Haraka will find the IP to connect AND it'll have that correct SNI hostname to present to the remote side.

felixauringer commented 7 months ago

Maybe, instead of telling Haraka that the next hop is 192.168.200.4:25, you should tell it the next hop is ${example}.fauringer.de:25. Assuming the DNS resolves (and it will, because it's so critical to SMTP, right?), Haraka will find the IP to connect AND it'll have that correct SNI hostname to present to the remote side.

But how do I do that? The RCPT TO address of that email was mailer@example.fauringer.de and I never provided the IP 192.168.200.4 anywhere, so Haraka must have resolved that internally.

msimerson commented 7 months ago

What is in the config for queue/lmtp?

[outbound] hook=get_mx plugin=queue/lmtp function=hook_get_mx params=example.fauringer.de retval=CONT msg=""
[outbound] delivering from: example.fauringer.de to: 192.168.200.4:25 (0) (0)
msimerson commented 7 months ago

This patch might fix it for you. It skips setting the tls.servername property if the mx.exchange is an IP address.

diff --git a/outbound/tls.js b/outbound/tls.js
index 93ad3715..0119996e 100644
--- a/outbound/tls.js
+++ b/outbound/tls.js
@@ -69,6 +69,7 @@ class OutboundTLS {
     }

     get_tls_options (mx) {
+        if (net.isIP(mx.exchange)) return this.cfg
         return Object.assign(this.cfg, {servername: mx.exchange});
     }
felixauringer commented 7 months ago

What is in the config for queue/lmtp?

My LMTP config is (dovecot is resolvable in the docker compose setup where all this is running):

[main]
host=dovecot
port=24

But the logs where recorded when I tried to send an email from Haraka (example.fauringer.de) to the same Haraka instance (again example.fauringer.de) and the error was that it could not establish a TLS connection with itself. So it surprises me that LMTP is involved here at all.

Thanks for the patch, I will try it later today!

felixauringer commented 7 months ago

This patch might fix it for you. It skips setting the tls.servername property if the mx.exchange is an IP address.

Sorry for the long wait, I only managed to try it out this weekend. With that fix, Haraka does not use the IP anymore but it also does not use the domain from the MAIL FROM command. Now, it uses localhost so it seems to do some reverse DNS lookup to get that name (EDIT: according to the node docs, localhost is just the default if options.host and options.servername` are not set).

EDIT: I also looked at this.cfg and mx and the hostname from the email-address is not in either of them.

felixauringer commented 7 months ago

I debugged a little bit more and I think I know what the problem is.

I do not have any get_mx plugins defined, so in outbound/hmail.js:get_mx_respond the default action mx_lookup.mx_lookup is executed. mx_lookup tries to find the mail server domain using DNS MX. If that fails, it tries to find the mail server IP using DNS A.

I have all this running in Docker and do not have the MX record configured properly (my bad), so Haraka falls back to resolve the mail server address to an IP. So the underlying problem is my DNS misconfiguration but I still find Haraka's fallback mechanism a little weird. Wouldn't it make more sense to just return the domain (instead of resolving it to an IP address) when there is no MX record?

EDIT: Nevermind, the next step in try_deliver is again dns resolution, so my small plugin to return a domain name from get_mx does not work.

msimerson commented 7 months ago

Wouldn't it make more sense to just return the domain (instead of resolving it to an IP address) when there is no MX record?

No, because a domain is not what a MX connect to. Most domains have an MX record with a hostname of a mail server, and that server probably does not match the TLS servername property.

felixauringer commented 7 months ago

I'm still a little bit confused what you would expect as domain in the mail server's TLS certificate. Maybe an example helps:

A company has the domain example.com with an MX record pointing to mail.example.com and an A record pointing to 192.168.10.1. The A record of mail.example.com points to 192.168.10.2.

I would now expect that the TLS certificate of the Haraka server at 192.168.10.2 contains mail.example.com as servername and that clients connecting to it are using this domain to do so.

If I understand the source code and my debugging instance of Haraka correctly, Haraka would use 192.168.10.2 as the servername in SNI and as the servername to validate the certificate against, is that correct?

msimerson commented 6 months ago

I'm still a little bit confused what you would expect as domain in the mail server's TLS certificate.

I wouldn't expect a domain in a mail servers TLS certificate, I'd expect a hostname. In your example, the domain in question is example.com and the host name is mail.example.com. The MTA sending the message will be sending mail to user@example.com via the MX pointer to the mail.example.com host name. (See RFC 822 for further semantics of domain and host names).

I would now expect that the TLS certificate of the Haraka server at 192.168.10.2 contains mail.example.com as servername and that clients connecting to it are using this domain to do so.

This is a reasonable expectation. It's also reasonable that the MTA at 192.168.10.2 doesn't support SNI and will respond with the hosts only TLS identity, which might be mail.example.com or could instead be mail.isp.com or some such. Quite often a single MTA hosts mail for tens or thousands of domains. Which is why we don't by default require TLS identity matching.

If I understand the source code and my debugging instance of Haraka correctly, Haraka would use 192.168.10.2 as the servername in SNI and as the servername to validate the certificate against, is that correct?

Yes, which is what the patch I proposed testing above avoids. It doesn't set the servername TLS option when the remote is an IP address.

felixauringer commented 6 months ago

I have now a local environment with properly configured MX records and everything works as expected.

Yes, which is what the patch I proposed testing above avoids. It doesn't set the servername TLS option when the remote is an IP address.

I think the main reason for my confusion here was that IPs are explicitly used as fallback. In my understanding, opening a TCP or TLS socket can always be done (and also should be done according to node) using a hostname so that the underlying libraries handle DNS resolution and more importantly TLS hostnames / SNI correctly.

Nevertheless, it works now. Thank you very much for your patience with me and all the answers :)