mjl- / mox

modern full-featured open source secure mail server for low-maintenance self-hosted email
https://www.xmox.nl
MIT License
3.7k stars 111 forks source link

EOF (io error) on normal connection closure #224

Closed wneessen closed 1 month ago

wneessen commented 1 month ago

I noticed a lot of these log entries in my mox logs: mox[3323]: l=info m="connection closed" err="read: EOF (io error)" pkg=smtpserver cid=19246e51e94 delta="241.016µs"

These seems to be caused by my monitoring system, which tries to check if the smtp server is running by connecting, issueing a "EHLO" and a "QUIT. Turns out, I can reproduce this via openssl s_client:

$ openssl s_client -connect my-server.de:587 -starttls smtp

Connecting to 123.123.123.124
CONNECTED(00000003)
depth=2 C=US, O=Internet Security Research Group, CN=ISRG Root X1
verify return:1        
depth=1 C=US, O=Let's Encrypt, CN=E5
verify return:1
depth=0 CN=my-sever.de
verify return:1
---                                                                                 
Certificate chain
[...]
read R BLOCK
EHLO test.de
250-my-server.de
250-PIPELINING
250-SIZE 104857600
250-REQUIRETLS
250-AUTH SCRAM-SHA-256-PLUS SCRAM-SHA-256 SCRAM-SHA-1-PLUS SCRAM-SHA-1 CRAM-MD5 PLAIN LOGIN
250-FUTURERELEASE 5184000 2024-12-02T08:56:10Z
250-ENHANCEDSTATUSCODES
250-8BITMIME
250-LIMITS RCPTMAX=1000
250 SMTPUTF8
QUIT 
DONE

Produces these logs:

Oct 03 10:56:07 mail1 mox[3323]: l=info m="new connection" pkg=smtpserver remote=37.81.72.14:18539 local=10.0.6.1:587 submission=true tls=false listener=public cid=19246e51e99 delta="132.272µs"
Oct 03 10:56:07 mail1 mox[3323]: l=debug m="smtp command result" pkg=smtpserver kind=submission cmd=ehlo code=250 ecode= duration="50.894µs" cid=19246e51e99 delta=24.073353ms
Oct 03 10:56:07 mail1 mox[3323]: l=debug m="smtp command result" pkg=smtpserver kind=submission cmd=starttls code=220 ecode=2.0.0 duration="53.072µs" cid=19246e51e99 delta=22.500801ms
Oct 03 10:56:07 mail1 mox[3323]: l=debug m="starting tls server handshake" pkg=smtpserver cid=19246e51e99 delta="114.181µs"
Oct 03 10:56:07 mail1 mox[3323]: l=debug m="tls server handshake done" pkg=smtpserver tls=TLS1.3 ciphersuite=TLS_AES_128_GCM_SHA256 cid=19246e51e99 delta=59.897387ms
Oct 03 10:56:10 mail1 mox[3323]: l=debug m="smtp command result" pkg=smtpserver kind=submission cmd=ehlo code=250 ecode= duration="59.576µs" cid=19246e51e99 delta=2.975022223s
Oct 03 10:56:12 mail1 mox[3323]: l=info m="connection closed" err="read: EOF (io error)" pkg=smtpserver cid=19246e51e99 delta=1.877245332s

I wonder why this causes an EOF. It seems to be related to TLS, since I can only reproduce it via STARTTLS or explicit TLS (465)? It also looks like the "QUIT" is not logged. This this does not happen on a non-TLS connection, though.

Is this an issue with mox or with openssl?

mjl- commented 1 month ago

Interesting, this had me scratching my head for a while, staring at the code. I could reproduce this with the openssl command. But it turns out that the QUIT we're writing doesn't actually make it to mox: openssl s_client interprets some input as commands for itself. Any line starting with Q closes the connection, client-side! Also see https://docs.openssl.org/3.4/man1/openssl-s_client/#connected-commands-basic. There are a few more.

If you add -nocommands on the command-line, you can write the QUIT-line properly, and you'll get a regular "connection closed" message.

Which monitoring system are you using? Perhaps it isn't writing a QUIT command after all? I'm using a blackboxexporter to monitor SMTP, it does write a QUIT, and causes a clean "connection closed" logging line.

Still, I would consider the "read: EOF (io error)" message as just a part of normal operation. In the past, that error message included something about "fatal", which scared some users. I think I may need to clean up the log levels at some point. But so far I've liked seeing connection start & end in info logging.

For reference, my blackboxexporter smtp "module" definitions are (likely copied from an example config):

  smtp:
    prober: tcp
    tcp:
      query_response:
        - expect: "^220 ([^ ]+)(.+)$"
        - send: "EHLO prober"
        - expect: "^250-"
        - send: "QUIT"
  smtp_starttls:
    prober: tcp
    tcp:
      query_response:
        - expect: "^220 ([^ ]+)(.+)$"
        - send: "EHLO prober"
        - expect: "^250-STARTTLS"
        - send: "STARTTLS"
        - expect: "^220"
        - starttls: true
        - send: "EHLO prober"
        - expect: "^250-AUTH"
        - send: "QUIT"
wneessen commented 1 month ago

I wasn't aware of the -nocommand argument, good to know and a stupid default in my opinion. I also figured that the monitoring system just checks TCP 25 and does not issue any SMTP commands at all, which explains the connection EOFs, so that's an issue on my end. I'll close the issue since all questions are answered. Thanks agian for the quick support!