mhale / smtpd

An SMTP server package written in Go, in the style of the built-in HTTP server.
The Unlicense
397 stars 92 forks source link

`net.LookupAddr` can take so long that clients run into a timeout #40

Closed jannschu closed 2 months ago

jannschu commented 7 months ago

The net.LookupAddr function is used at a few places in the code, for example right after accepting a connection. If that function does not terminate quickly big response times could arise that risk timeouts or major slowdowns.

I suggest to remove the lookup because it seems this is only used for diagnostics and the reverse lookup is flaky anyway. Please be aware that even if instead a very short timeout is added this can easily pile up to a major slowdown if called hundreds of times, say in a CI test. Even when the lookup is working as intended this can be a relatively slow process that may involve several DNS lookups.

Background

I am using this trough axllent/mailpit for local development and CI. In my Gitlab CI setup several services are running that try to connect via SMTP. For some reason in my particular environment net.LookupAddr takes 10s (returns an error). Unfortunately that does not only slow down things a lot but also breaks several timeouts.

Arguably there is something off with the reverse lookup in my environment but because the lookup is such an unreliable thing to do I would use that opportunity to get rid of it 😢 In case you are interested, I think it is a bug somewhere in Docker, musl or Go (what a line up…), but it is hard to get a small reproducible setup of this, currently I can reproduce that only through jobs in a Gitlab runner…

It was a quite a journey to spot this issue, but I was able to reproduce the 10s spent between the initial TCP SYN and the first message (220 host Mailpit ESMTP Service ready) by isolating the net.LookupAddr call that was made, and all of that in CI jobs, what a day 😄. Hope that saves someone else's day.

jannschu commented 7 months ago

For the case someone cares, I continued looking into it, the cause is this issue: https://gitlab.com/gitlab-org/gitlab-runner/-/issues/27763

mhale commented 7 months ago

I'd prefer to keep the remote lookup as it's useful in the headers, but perhaps it could be disabled with an environment variable. Do you have the ability to set one?

jannschu commented 7 months ago

Thanks for your reply, yes an environment variable would work for me. But I think this is hard to discover for users, especially if used as an indirect dependency.

Some more, also not ideal, ideas:

The async lookup could be ineffective because a connection might be shorter than what it takes to perform the lookup.

Out of curiosity I checked what postfix does and it seems it is using a 5s timeout by default. It also offers an option to turn this lookup off etc. but I think my particular issue was a sum of several independent problems:

A shorter timeout would have worked for me actually, so maybe the right way to proceed is to set add a 5s timeout and the other issues should be solved by the corresponding parties. I'll see that I open an issue for Docker.

Workaround

I ended up disabling DNS in my case by emptying /etc/resolv.conf when the container starts. Not very clean, but was acceptable for this use case.

axllent commented 7 months ago

I don't think async would work as the Received: from ... email header need to be injected immediately for processing.

I agree that reverse lookups shouldn't be disabled by default as this mimics the behaviour of other SMTPD servers, but having an option to disable reverse lookups sounds like a good work-around as I could add an optional command flag to disable it through Mailpit itself for such cases :+1:

atesca09 commented 6 months ago

An option to disable reverse lookups would be very much appreciated as it would solve a similar problem for me too

mhale commented 5 months ago

@atesca09 This option is now added, thanks to @axllent for the PR.

axllent commented 5 months ago

This is great @mhale , thank you! I have tested and can confirm this option resolves the issue completely. Using the docker composer example @atesca09 provided with a local Mailpit test image with I created (using your master branch and the DisableReverseDNS: true temporarily hardcoded into Mailpit) the test ran in 0.8s instead of 11s - keeping in mind that this includes the docker container startup time too.

Docker composer command:

time docker compose run --rm  curl smtp://mailpit:1025 --mail-from sender@example.com --mail-rcpt recipient@example.com --upload-file /email.txt
[+] Creating 1/0
 ✔ Container really-long-project-name-for-testing-dns-rdata-mailpit-1  Running                                                                                                 0.0s 
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   145    0     0  100   145      0  39628 --:--:-- --:--:-- --:--:-- 48333

real    0m0.818s
user    0m0.058s
sys     0m0.043s

Email output:

Message-Id: <0fbf52a8-cf27-4b04-83e7-2292e5ed1e84@mailpit>
Return-Path: <sender@example.com>
Received: from email.txt (unknown [172.22.0.3])
        by be03dac07d53 (Mailpit) with SMTP
        for <recipient@example.com>; Sat, 20 Jan 2024 18:55:38 +0000 (UTC)
From: sender@example.com
To: recipient@example.com
Subject: Email Subject

This is the body of the email.
It can contain multiple lines of text.

Do you have an ETA for a new tag? I'd like to ship this change with a new Mailpit release and I'd really rather stick to tags instead of the master branch.

Thanks again!

mhale commented 5 months ago

I've just pushed a v0.8.2 tag.

jannschu commented 5 months ago

Thanks to everyone involved! :)

atesca09 commented 5 months ago

That solves my issue as well. Thank you all