`UnparseableReply`: Bug in `impl<T: AsyncRead + AsyncWrite + Unpin> SmtpClient<T>`. (Fixed by #13)

LeoniePhiline commented 1 year ago

Hi!

I am experiencing some weird SMTP failures.

I have a mailhog (https://github.com/mailhog/MailHog / https://registry.hub.docker.com/r/mailhog/mailhog/) set up behind a traefik (https://registry.hub.docker.com/_/traefik) reverse proxy. The entire thing is happening within a docker environment on my local machine - no remote network failures.

The setup is simple: MailHog just exposes port 1025 for plain SMTP; Traefik does the TLS and exposes port 587. (TLS is self-signed; using allow_invalid_certs().)

The Traefik config for the TLS & TCP setup is basically the same as the one shown here: https://docker-mailserver.github.io/docker-mailserver/edge/examples/tutorials/mailserver-behind-proxy/#configuration-of-the-used-proxy-software

Most of the time, I can send mail just fine. However, every now and then I receive an UnparseableReply error.

I can reproduce it by just running the test again and again. It does not matter if I run it in quick succession or wait for a second or more before retrying.

In these cases, mailhog logs:

Starting session
[PROTO: INVALID] Started session, switching to ESTABLISH state
Sent 45 bytes: '220 mail.some-host.berlin ESMTP MailHog\r\n'
Received 15 bytes: 'EHLO felicity\r\n'
[PROTO: ESTABLISH] Processing line: EHLO felicity
[PROTO: ESTABLISH] In state 1, got command 'EHLO', args 'felicity'
[PROTO: ESTABLISH] In ESTABLISH state
[PROTO: ESTABLISH] Got EHLO command, switching to MAIL state
Sent 20 bytes: '250-Hello felicity\r\n'
Sent 16 bytes: '250-PIPELINING\r\n'
Sent 16 bytes: '250 AUTH PLAIN\r\n'
Connection closed by remote host
Session ended

Note: Connection closed by remote host (by mail-send.)

The SmtpClientBuilder's timeout is set to 30 seconds. The test sending, however, fails immediately.

Note that this test is done without Credentials.

Where Connection closed by remote host occurs, commonly MailHog should receive the first header: Received 30 bytes: 'MAIL FROM:<from@example.com>\r\n'. See below for a full (elided message body) example of a successful send.

And my test shows:

running 2 tests
test tests::renders_full_newsletter ... ok
test mail::tests::mailer_sends_smtp_tls_mail ... FAILED

failures:

---- mail::tests::mailer_sends_smtp_tls_mail stdout ----
Error: SendMail(UnparseableReply)

failures:
    mail::tests::mailer_sends_smtp_tls_mail

test result: FAILED. 1 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.01s

error: test failed, to rerun pass `--lib`

SendMail is my own error enum variant. UnparseableReply is mail_send::Error.

Traefik writes nothing special to the log (log level Debug):

Handling connection from 172.254.1.1:48226

In contrast, here's the MailHog logs from a good mail send:

Starting session
[PROTO: INVALID] Started session, switching to ESTABLISH state
Sent 45 bytes: '220 mail.some-host.berlin ESMTP MailHog\r\n'
Received 15 bytes: 'EHLO felicity\r\n'
[PROTO: ESTABLISH] Processing line: EHLO felicity
[PROTO: ESTABLISH] In state 1, got command 'EHLO', args 'felicity'
[PROTO: ESTABLISH] In ESTABLISH state
[PROTO: ESTABLISH] Got EHLO command, switching to MAIL state
Sent 20 bytes: '250-Hello felicity\r\n'
Sent 16 bytes: '250-PIPELINING\r\n'
Sent 16 bytes: '250 AUTH PLAIN\r\n'
Received 30 bytes: 'MAIL FROM:<from@example.com>\r\n'
[PROTO: MAIL] Processing line: MAIL FROM:<from@example.com>
[PROTO: MAIL] In state 6, got command 'MAIL', args 'FROM:<from@example.com>'
[PROTO: MAIL] In MAIL state
[PROTO: MAIL] Got MAIL command, switching to RCPT state
Sent 32 bytes: '250 Sender from@example.com ok\r\n'
Received 26 bytes: 'RCPT TO:<to@example.com>\r\n'
[PROTO: RCPT] Processing line: RCPT TO:<to@example.com>
[PROTO: RCPT] In state 7, got command 'RCPT', args 'TO:<to@example.com>'
[PROTO: RCPT] In RCPT state
[PROTO: RCPT] Got RCPT command
Sent 33 bytes: '250 Recipient to@example.com ok\r\n'
Received 6 bytes: 'DATA\r\n'
[PROTO: RCPT] Processing line: DATA
[PROTO: RCPT] In state 7, got command 'DATA', args ''
[PROTO: RCPT] In RCPT state
[PROTO: RCPT] Got DATA command, switching to DATA state
Sent 37 bytes: '354 End data with <CR><LF>.<CR><LF>\r\n'
Received 1024 bytes: 'From: <from@example.com>\r\nTo: <to@example.com>\r\nSubject: Test mail\r\nReply-To: <reply-to@example.com>\r\nMessage-ID: <174421ede9d619d9.fbb0693af3efc5c7.146ed3f42bb856b@felicity>\r\nDate: Wed, 15 Feb 2023 23:04:27 +0000\r\nContent-Type: text/html; charset="utf-8"\r\nContent-Transfer-Encoding: quoted-printable\r\n\r\n<!doctype html><html xmlns=3D"http://www.w3.org/1999/xhtml" xmlns:v=3D"urn:s=\r\nchemas-microsoft-com:vml" xmlns:o=3D"urn:schemas-microsoft-com:office:office=\r\n"> [... AND SO ON, receiving the entire mail in chunks of up to 1024 B]'
[PROTO: DATA] Got EOF, storing message and switching to MAIL state
Parsing Content from string: 'From: <from@example.com>
To: <to@example.com>
Subject: Test mail
Reply-To: <reply-to@example.com>
Message-ID: <174421ede9d619d9.fbb0693af3efc5c7.146ed3f42bb856b@felicity>
Date: Wed, 15 Feb 2023 23:04:27 +0000
Content-Type: text/html; charset="utf-8"
Content-Transfer-Encoding: quoted-printable

<!doctype html><html xmlns=3D"http://www.w3.org/1999/xhtml" xmlns:v=3D"urn:s=
chemas-microsoft-com:vml" xmlns:o=3D"urn:schemas-microsoft-com:office:office=
">[AND SO ON... rest of the mail - not relevant to the bug report]'
Storing message ZjsPQpZCLXvLR44_7SWRBJpCTy9AfPV4JL_CvweyeoU=@mail.some-host.berlin
Sent 90 bytes: '250 Ok: queued as ZjsPQpZCLXvLR44_7SWRBJpCTy9AfPV4JL_CvweyeoU=@mail.some-host.berlin\r\n'
Got message in APIv2 websocket channel
[APIv2] BROADCAST /api/v2/websocket
Got message in APIv1 event stream
Connection closed by remote host
Session ended

PS: "Jim, the Chaos Monkey" (a MailHog feature intended to test application resilience in the face of failure) is disabled.

LeoniePhiline commented 1 year ago

Update: As was indicated by the MailHog logs, the error occurs in connect(). I have now mapped the result error to two variants:

smtp_client
    .connect()
    .await
    .map_err(MailError::ConnectSmtp)?
    .send(message)
    .await
    .map_err(MailError::SendMail)

The test thus fails now with:

Error: ConnectSmtp(UnparseableReply)
test mail::tests::mailer_sends_smtp_tls_mail ... FAILED

Edit:

Digging further, in read_ehlo(), EhloResponse::parse() returns an err SyntaxError { syntax: "unexpected token" }

This is weird:

Here's the error's origin (the same on each test failure) in smtp-proto:

XXX 4 location (see output below):

What I get is:

ch: 0
XXX 4
SyntaxError { syntax: "unexpected token" }
Unparseable SMTP reply

Why is ch being 0 not covered by the ch @ b'0'..=b'9' match arm in `EhloResponse::parse()?

Is one binary and the other ascii?

ch: 0
ch does not match b'0'.
XXX 4
SyntaxError { syntax: "unexpected token" }
Unparseable SMTP reply
Error: ConnectSmtp(UnparseableReply)

Let's print bits:

Good run:

running 1 test
matched ch: 00110010
matched ch: 00110101
matched ch: 00110000
matched ch: 00110010
matched ch: 00110101
matched ch: 00110000
matched ch: 00110010
matched ch: 00110101
matched ch: 00110000
test mail::tests::mailer_sends_smtp_tls_mail ... ok

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 1 filtered out; finished in 0.07s

Failed run:

running 1 test
matched ch: 00110010
matched ch: 00110101
matched ch: 00110000
matched ch: 00110010
matched ch: 00110101
matched ch: 00110000
matched ch: 00110010
matched ch: 00110101
matched ch: 00110000
matched ch: 00110010
matched ch: 00110101
matched ch: 00110000
non-matched ch: 00000000
ch does not match b'0'.
XXX 4
SyntaxError { syntax: "unexpected token" }
Unparseable SMTP reply
Error: ConnectSmtp(UnparseableReply)
thread 'mail::tests::mailer_sends_smtp_tls_mail' panicked at 'assertion failed: `(left == right)`

Another failed run - different output:

running 1 test
matched ch: 00110010
matched ch: 00110101
matched ch: 00110000
matched ch: 00110010
matched ch: 00110101
matched ch: 00110000
non-matched ch: 00000000
ch does not match b'0'.
XXX 4
SyntaxError { syntax: "unexpected token" }
Unparseable SMTP reply
Error: ConnectSmtp(UnparseableReply)
thread 'mail::tests::mailer_sends_smtp_tls_mail' panicked at 'assertion failed: `(left == right)`

So at minimum, 00110000 (ascii 0, decimal 48) is expected, but 00000000 (binary 0) is received.

Is this a MailHog bug? Or is the iter fed into EhloResponse::parse() not received correctly in read_ehlo()?

Checking iter construction in more detail:

(https://github.com/stalwartlabs/mail-send/blob/main/src/smtp/ehlo.rs#L56-L60)

Observation:

Every successful run reads from the initial buf with buf_concat being empty. It looks like this (identical on each successful run):

running 1 test
buf_concat is empty. Reading start of buf.
matched ch: 00110010
matched ch: 00110101
matched ch: 00110000
matched ch: 00110010
matched ch: 00110101
matched ch: 00110000
matched ch: 00110010
matched ch: 00110101
matched ch: 00110000
test mail::tests::mailer_sends_smtp_tls_mail ... ok

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 1 filtered out; finished in 0.06s

So none of the successful runs extend buf_concat from slice. On the other hand, every single failed run extends buf_concat from slice:

Example 1: Long output

running 1 test
buf_concat is empty. Reading start of buf.
matched ch: 00110010
matched ch: 00110101
matched ch: 00110000
matched ch: 00110010
matched ch: 00110101
matched ch: 00110000
buf_concat is not empty, Extending from buf.
matched ch: 00110010
matched ch: 00110101
matched ch: 00110000
matched ch: 00110010
matched ch: 00110101
matched ch: 00110000
non-matched ch: 00000000
ch does not match b'0'.
XXX 4
SyntaxError { syntax: "unexpected token" }
Unparseable SMTP reply
Error: ConnectSmtp(UnparseableReply)
thread 'mail::tests::mailer_sends_smtp_tls_mail' panicked at 'assertion failed: `(left == right)`

Example 2: Short output

running 1 test
buf_concat is empty. Reading start of buf.
matched ch: 00110010
matched ch: 00110101
matched ch: 00110000
buf_concat is not empty, Extending from buf.
matched ch: 00110010
matched ch: 00110101
matched ch: 00110000
non-matched ch: 00000000
ch does not match b'0'.
XXX 4
SyntaxError { syntax: "unexpected token" }
Unparseable SMTP reply
Error: ConnectSmtp(UnparseableReply)
thread 'mail::tests::mailer_sends_smtp_tls_mail' panicked at 'assertion failed: `(left == right)`

Is this some kind of an off by one error, reading into the initial 0u8 which buf is initialized with? I am thinking of this line:

buf_concat.extend_from_slice(&buf[..br]);

(https://github.com/stalwartlabs/mail-send/blob/main/src/smtp/ehlo.rs#L59)

Alright, this is interesting:

Running code:

            let mut iter = if buf_concat.is_empty() {
                eprintln!("buf_concat is empty. Reading start of buf.");
                buf[..br].iter()
            } else if br + buf_concat.len() < MAX_REPONSE_LENGTH {
                eprintln!("buf_concat is not empty, Extending from buf.");
                eprintln!("extending from slice: {:?}", &buf[..br]);
                buf_concat.extend_from_slice(&buf[..br]);
                eprintln!("extended: {:?}", buf_concat);
                buf_concat.iter()

Shows buf_concat's many b0, as it is extended at the end:


running 1 test
buf_concat is empty. Reading start of buf.
matched ch: 00110010
matched ch: 00110101
matched ch: 00110000
buf_concat is not empty, Extending from buf.
extending from slice: [50, 53, 48, 45, 80, 73, 80, 69, 76, 73, 78, 73, 78, 71, 13, 10, 50, 53, 48, 32, 65, 85, 84, 72, 32, 80, 76, 65, 73, 78, 13, 10]
extended: [50, 53, 48, 45, 72, 101, 108, 108, 111, 32, 56, 56, 55, 100, 48, 52, 53, 57, 100, 51, 98, 97, 13, 10, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 50, 53, 48, 45, 80, 73, 80, 69, 76, 73, 78, 73, 78, 71, 13, 10, 50, 53, 48, 32, 65, 85, 84, 72, 32, 80, 76, 65, 73, 78, 13, 10]
matched ch: 00110010
matched ch: 00110101
matched ch: 00110000
non-matched ch: 00000000
ch does not match b'0'.
XXX 4
SyntaxError { syntax: "unexpected token" }
Unparseable SMTP reply
Error: ConnectSmtp(UnparseableReply)
thread 'mail::tests::mailer_sends_smtp_tls_mail' panicked at 'assertion failed: `(left == right)`

I guess the bug is the initialization let mut buf = vec![0u8; 1024]; instad of Vec::with_capacity(1024)? No, not exactly. But close.

This is the problem - here the full buf (all almost 1024 b0s) are dumped into buf_concat:

                        if buf_concat.is_empty() {
                            buf_concat = buf.to_vec();
                        }

(https://github.com/stalwartlabs/mail-send/blob/main/src/smtp/ehlo.rs#L69-L71)

Confirmed:

Extending from slice no longer dumps the entire buf into the buf_concat vector:

running 1 test
Bytes read: 24
buf_concat is empty. Reading start of buf.
matched ch: 00110010
matched ch: 00110101
matched ch: 00110000
Dump into vec 1
Bytes read: 32
buf_concat is not empty, Extending from buf.
extending from slice: [50, 53, 48, 45, 80, 73, 80, 69, 76, 73, 78, 73, 78, 71, 13, 10, 50, 53, 48, 32, 65, 85, 84, 72, 32, 80, 76, 65, 73, 78, 13, 10]
extended: [50, 53, 48, 45, 72, 101, 108, 108, 111, 32, 56, 56, 55, 100, 48, 52, 53, 57, 100, 51, 98, 97, 13, 10, 50, 53, 48, 45, 80, 73, 80, 69, 76, 73, 78, 73, 78, 71, 13, 10, 50, 53, 48, 32, 65, 85, 84, 72, 32, 80, 76, 65, 73, 78, 13, 10]
matched ch: 00110010
matched ch: 00110101
matched ch: 00110000
matched ch: 00110010
matched ch: 00110101
matched ch: 00110000
matched ch: 00110010
matched ch: 00110101
matched ch: 00110000
test mail::tests::mailer_sends_smtp_tls_mail ... ok

mdecimus commented 1 year ago

Just out of curiosity, could you share the EHLO response that triggered this bug?

LeoniePhiline commented 1 year ago

As far as I can see, the responses are normal; maybe it's about timing? When does let br = self.stream.read(&mut buf).await?; decide to stop reading and yield with some br?

You can see the bytes received as binary above. The bytes themselves don't appear to be special - only the fact that they are being read in two steps caused the bug to surface.

mdecimus commented 1 year ago

When does let br = self.stream.read(&mut buf).await?; decide to stop reading and yield with some br?

The OS decides when to stop writing the incoming bytes into the buffer. This usually occurs when the entire packet is received or when the buffer gets full. What probably happened here is that for some reason the EHLO response was sent in two separate frames and triggered this bug.

LeoniePhiline commented 1 year ago

Yes, this perfectly matches my observations.

stalwartlabs / mail-send

`UnparseableReply`: Bug in `impl<T: AsyncRead + AsyncWrite + Unpin> SmtpClient<T>`. (Fixed by #13) #12