trusteddomainproject / OpenDKIM

Other
97 stars 52 forks source link

dns timeout didn't result in tempfail #176

Open elmaimbo opened 1 year ago

elmaimbo commented 1 year ago

This is a copy of https://bugs.launchpad.net/ubuntu/+source/opendkim/+bug/1914889, which David Bürgin suggested raising an upstream bug for.

My mail logs show that OpenDKIM experienced a DNS timeout when validating the DKIM signature on an inbound email, but then accepted the message. As a result the DMARC policy was applied and the message was quarantined.

The expected behaviour in the event of a DNS timeout is that it would use the On-DNSError setting, which defaults to tempfail, and would cause the sending MTA to retry at a later time, and when this happened it is expected that the result of the earlier DNS query would be immediately available because it would have been cached by the DNS server, and so the signature would have been successfully verified and therefore would have passed DMARC check (and the email would have been accepted).

Here is what my mail logs showed:

Jan 28 11:27:33 mx postfix/postscreen[19584]: CONNECT from [168.100.1.4]:14353 to [192.168.20.197]:25
Jan 28 11:27:33 mx postfix/postscreen[19584]: PASS OLD [168.100.1.4]:14353
Jan 28 11:27:33 mx postfix/smtpd[19585]: connect from russian-caravan.cloud9.net[168.100.1.4]
Jan 28 11:27:34 mx postfix/smtpd[19585]: Anonymous TLS connection established from russian-caravan.cloud9.net[168.100.1.4]: TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)
Jan 28 11:27:35 mx policyd-spf[19610]: prepend Received-SPF: None (mailfrom) identity=mailfrom; client-ip=168.100.1.4; helo=russian-caravan.cloud9.net; <email address hidden>; receiver=<UNKNOWN>
Jan 28 11:27:35 mx policyd-spf[19615]: prepend Received-SPF: None (mailfrom) identity=mailfrom; client-ip=168.100.1.4; helo=russian-caravan.cloud9.net; <email address hidden>; receiver=<UNKNOWN>
Jan 28 11:27:35 mx postfix/smtpd[19585]: 5481E5E10A6: client=russian-caravan.cloud9.net[168.100.1.4]
Jan 28 11:27:35 mx postfix/cleanup[19627]: 5481E5E10A6: message-id=<email address hidden>
Jan 28 11:27:40 mx opendkim[1731]: 5481E5E10A6: key retrieval failed (s=202101-e055eb0c, d=patpro.net): '202101-e055eb0c._domainkey.patpro.net' query timed out
Jan 28 11:27:40 mx opendmarc[1534]: implicit authentication service: mx.tait.net.nz
Jan 28 11:27:40 mx opendmarc[1534]: 5481E5E10A6: SPF(mailfrom): <email address hidden> none
Jan 28 11:27:44 mx opendmarc[1534]: 5481E5E10A6: patpro.net fail
Jan 28 11:27:44 mx postfix/cleanup[19627]: 5481E5E10A6: milter-hold: END-OF-MESSAGE from russian-caravan.cloud9.net[168.100.1.4]: milter triggers HOLD action; from=<email address hidden> to=<email address hidden> proto=ESMTP helo=<russian-caravan.cloud9.net>
Jan 28 11:27:44 mx postfix/smtpd[19585]: disconnect from russian-caravan.cloud9.net[168.100.1.4] ehlo=2 starttls=1 mail=1 rcpt=1 data=1 quit=1 commands=7

Here is the content of opendkim.conf (with comments and blank lines removed):

Syslog yes
UMask 007
Domain tait.net.nz
KeyFile /etc/dkimkeys/20201017.private
Selector 20201017
InternalHosts 127.0.0.0/8,::1,192.168.20.192/28
Canonicalization relaxed/simple
AlwaysAddARHeader yes
Socket local:/var/run/opendkim/opendkim.sock
PidFile /var/run/opendkim/opendkim.pid
OversignHeaders From
TrustAnchorFile /usr/share/dns/root.key
UserID opendkim

You can see there is no DNSTimeout setting, meaning it uses the default value of 5 seconds. Also there are no On-... options, so these are also using default settings. My expectation is that a DNS timeout would use On-DNSError default value, which is documented as tempfail. However it would appear that it is using some other setting which defaults to accept?

Version info:

$ lsb_release -rd
Description: Ubuntu 18.04.5 LTS
Release: 18.04

$ apt-cache policy opendkim
opendkim:
  Installed: 2.11.0~alpha-11build1
  Candidate: 2.11.0~alpha-11build1
  Version table:
 *** 2.11.0~alpha-11build1 500
        500 http://nz.archive.ubuntu.com/ubuntu bionic/universe i386 Packages
        100 /var/lib/dpkg/status

At the time the original bug was raised (https://bugs.launchpad.net/ubuntu/+source/opendkim/+bug/1914889) I considered this to be a one-off. However on checking my historical mail logs today I discovered that it had occurred at least one other time (on 23 October 2022).

Please note that this issue isn't something I can easily recreate. But hopefully there is enough information here to determine what happened?

Thanks, Nick.

elmaimbo commented 1 year ago

I've done a fair bit of testing on this, and found a way to reliably replicate this issue using the command below:

sendmail $USER <<EOF
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=nz.dimensiondata.com;
 s=20230220; t=1680737773;
 bh=3VWGQGY+cSNYd1MGM+X6hRXU0stl8JCaQtl4mbX/j2I=;
 h=Date:From:To:Subject:From;
 b=LKOjYqqZ+qCPntsULYCEb8FX4v5FzeuLadNc1sPjGXk0dO5ZK3x9ynmNhm1Zu7fvR
  FhBxlDNasepK11u795VtAGmBT+i2qDNm7vX2xVZkMtDd2USHFsGyKxbjdb9pFuUiXt
  Ls7sk7VRYEhzgwYRfRPaGbH6Ul2Jz6pAC9HVUPQlycA50wheDW++BIILz3DgJZsAEX
  vwU2XWDPp6mG+RdeVNxuA6ISvKfuK91aBkVqaPEFr/usbKBoSG9vI2RjQqTS53eKHe
  tkK4NG04txs7hAMQs8KajRMcH3bls1zXQiMPq8zqNAhQlbEuY3g1KLHnKcrCZ3h/b6
  4YGw2AxqM6w5A==
From: Sender <sender@nz.dimensiondata.com>
To: Recipient <recipient@example.com>
Subject: Test message

This is a test!
.
EOF

Explanation: The above command should be able to be run on a mail server (e.g. Postfix) that has opendkim configured. It submits an email that has an RFC5322.From address containing the domain "nz.dimensiondata.com". The email also contains a DKIM-Signature for the correct domain, which will cause OpenDKIM to query DNS for a TXT record with the name "20230220._domainkey.nz.dimensiondata.com", and here is the key to reproducing the issue: The DNS request will not be answered. Consequently, OpenDKIM will time-out on the DNS request, and should tempfail the message. However what actually happens is it accepts the message, which is then delivered by the mail server (to the current user).

Disclaimer: I can offer no guarantees that "nz.dimensiondata.com" will always behave that way. But I discovered this domain behaved like this a while ago and it still hasn't been fixed.

I hope this information will make it easy to replicate & resolve this bug?

Thanks, Nick.

andreasschulze commented 1 year ago

I also expect messages are tempfailed on DNS errors.

here are my steps to reproduce:

$ opendkim -V
opendkim: OpenDKIM Filter v2.11.0
        Compiled with OpenSSL 3.1.1 30 May 2023
        SMFI_VERSION 0x1000001
        libmilter version 1.0.1
        Supported signing algorithms:
                rsa-sha1
                rsa-sha256
                ed25519-sha256
        Supported canonicalization algorithms:
                relaxed
                simple
        Active code options:
                USE_LDAP
                USE_LUA
                _FFR_SENDER_MACRO
        libopendkim 2.11.0:

$ cat <<EOF > /tmp/msg
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=nz.dimensiondata.com;
 s=20230220;
 bh=3VWGQGY+cSNYd1MGM+X6hRXU0stl8JCaQtl4mbX/j2I=;
 h=From;
 b=foo
From: "not dimensiondata.com" <root@localhost>

testbody
EOF

$ cat <<EOF > /tmp/conf
On-KeyNotFound tempfail
On-DNSError tempfail # this is default anyway
EOF

# check that opendkim really understand my bare configuration file
$ opendkim -x /tmp/conf -e On-KeyNotFound
tempfail
$ opendkim -x /tmp/conf -e On-DNSError
tempfail

# check that we really have a "dns error" situation
$ dig 20230220._domainkey.nz.dimensiondata.com. txt | grep status
;; ->>HEADER<<- opcode: QUERY; status: SERVFAIL; id: 61505

# run the test
$ opendkim -v -v -x /tmp/conf -t /tmp/msg
opendkim: mlfi_connect() returned SMFIS_CONTINUE
opendkim: /tmp/msg: mlfi_envfrom() returned SMFIS_CONTINUE
opendkim: /tmp/msg: mlfi_envrcpt() returned SMFIS_CONTINUE
opendkim: /tmp/msg: line 1: mlfi_header() returned SMFIS_CONTINUE
opendkim: /tmp/msg: line 6: mlfi_header() returned SMFIS_CONTINUE
opendkim: /tmp/msg: mlfi_eoh() returned SMFIS_CONTINUE
opendkim: /tmp/msg: mlfi_body() returned SMFIS_CONTINUE
### SETREPLY: rcode='451' xcode='4.7.5' replytxt='DKIM key retrieval failed'
opendkim: /tmp/msg: mlfi_eom() returned SMFIS_ACCEPT
opendkim: /tmp/msg: verification (s=20230220 d=nz.dimensiondata.com, 0-bit key, unknown) failed: key DNS query failed
opendkim: mlfi_close() returned SMFIS_CONTINUE

OpenDKIM should not accept that message, It's a bug.

andreasschulze commented 1 year ago

I thougt more about that and changed my mind. There is maybe a bug. In general a message shouldn't tempfail if opendkim can't access the public key from dns. So maybe the default on-dnserror: tempfail handle an other situation.

andreasschulze commented 1 year ago

the question is: why does opendkim set a 451 reply-code but still accept the message

mddvul22 commented 10 months ago

We are having this problem too. It becomes even more problematic when the sending domain's dmarc policy is to reject mail. In our case, we have incoming mail from the US Federal government. The DKIM query timed out and and their dmarc reject policy results in the email being permanently rejected.

mddvul22 commented 10 months ago

Am I correct in thinking that the only workaround for this right now, is to start up the opendkim daemon with a longer -T setting?

elmaimbo commented 10 months ago

Am I correct in thinking that the only workaround for this right now, is to start up the opendkim daemon with a longer -T setting?

You can specify the DNS timeout in the configuration file (/etc/opendkim.conf) using something like: DNSTimeout 15 (See manual page for opendkim.conf for more info.)