zonemaster / zonemaster-engine

The Zonemaster Engine - part of the Zonemaster project
Other
34 stars 33 forks source link

Zonemaster emits warning about inconsistent MX RRset #1374

Closed anandb-ripencc closed 2 months ago

anandb-ripencc commented 2 months ago

Please refer to the test result here: https://zonemaster.ripe.net/en/result/bc66381e5ac1b0c5

I'm using these versions:

Zonemaster-CLI version v7.0.0
Zonemaster-Engine version v6.0.0
Zonemaster-LDNS version 4.0.2
NL NetLabs LDNS version 1.8.3

Look at the Zone section, and the warning in there about inconsistent MX RRset data. Just below that are the actual results of the MX RRset responses from all name servers. I squinted, zoomed and looked hard, but I cannot see the inconsistency. I also did queries on the command line using dig, against all the name servers of the zone. I couldn't find the inconsistency. Are you able to shed any light on this please?

matsduf commented 2 months ago

I cannot see the difference either. I can confirm that zonemaster.net gives the same result. The same issue does not arise with other zones I have tested. What is special about kapper.net?

We will investigate.

anandb-ripencc commented 2 months ago

Actually, I don't know what is special about kapper.net. I got this report from a user, investigated, and then opened this issue. Hopefully, you can figure it out.

anandb-ripencc commented 2 months ago

Just for comparison: https://dnsviz.net/d/kapper.net/dnssec/

DNSViz didn't find any issues with the zone, and the MX record check was fine.

tgreenx commented 2 months ago

Hi all, I briefly started to look into this.

It seems that Zonemaster sees inconsistent TTL values between MX records returned by different name servers, hence the warning that is outputted. Here's an excerpt of Zonemaster-CLI where I dumped values (name server name/IP pair and MX resource record) from the responses:

$ zonemaster-cli kapper.net --test Zone09 --show-testcase --level=INFO --no-ipv6

Seconds Level    Testcase       Message
======= ======== ============== =======
   0.00 INFO     Unspecified    Using version v6.0.0 of the Zonemaster engine.

ns1.kapper.net/94.136.1.127
        kapper.net.     3600    IN      MX      10 inbound.kapper.net.

ns2.kapper.net/94.16.111.51
        kapper.net.     3565    IN      MX      10 inbound.kapper.net.

ns3.kapper.net/103.241.67.58
        kapper.net.     3565    IN      MX      10 inbound.kapper.net.

ns4.kapper.net/139.99.239.52
        kapper.net.     3564    IN      MX      10 inbound.kapper.net.

ns5.kapper.net/94.136.22.5
        kapper.net.     3564    IN      MX      10 inbound.kapper.net.

ns6.kapper.net/97.74.83.192
        kapper.net.     3564    IN      MX      10 inbound.kapper.net.

ns7.kapper.net/144.217.92.144
        kapper.net.     3565    IN      MX      10 inbound.kapper.net.

ns8.kapper.net/195.200.6.20
        kapper.net.     3564    IN      MX      10 inbound.kapper.net.
  11.20 WARNING  Zone09         The MX RRset data is inconsistent between the name servers.
  11.20 INFO     Zone09         Mail targets in the MX RRset "inbound.kapper.net." returned from name servers "97.74.83.192".
  11.20 INFO     Zone09         Mail targets in the MX RRset "inbound.kapper.net." returned from name servers "94.136.1.127".
  11.20 INFO     Zone09         Mail targets in the MX RRset "inbound.kapper.net." returned from name servers "103.241.67.58".
  11.20 INFO     Zone09         Mail targets in the MX RRset "inbound.kapper.net." returned from name servers "144.217.92.144".
  11.20 INFO     Zone09         Mail targets in the MX RRset "inbound.kapper.net." returned from name servers "195.200.6.20".
  11.20 INFO     Zone09         Mail targets in the MX RRset "inbound.kapper.net." returned from name servers "139.99.239.52".
  11.20 INFO     Zone09         Mail targets in the MX RRset "inbound.kapper.net." returned from name servers "94.136.22.5".
  11.20 INFO     Zone09         Mail targets in the MX RRset "inbound.kapper.net." returned from name servers "94.16.111.51".

But while it does seem to happen almost every time, it's not all the time. Here's a second excerpt just seconds apart from the previous one:

$ zonemaster-cli kapper.net --test Zone09 --show-testcase --level=INFO --no-ipv6

Seconds Level    Testcase       Message
======= ======== ============== =======
   0.00 INFO     Unspecified    Using version v6.0.0 of the Zonemaster engine.

ns1.kapper.net/94.136.1.127
        kapper.net.     3600    IN      MX      10 inbound.kapper.net.

ns2.kapper.net/94.16.111.51
        kapper.net.     3600    IN      MX      10 inbound.kapper.net.

ns3.kapper.net/103.241.67.58
        kapper.net.     3600    IN      MX      10 inbound.kapper.net.

ns4.kapper.net/139.99.239.52
        kapper.net.     3600    IN      MX      10 inbound.kapper.net.

ns5.kapper.net/94.136.22.5
        kapper.net.     3600    IN      MX      10 inbound.kapper.net.

ns6.kapper.net/97.74.83.192
        kapper.net.     3600    IN      MX      10 inbound.kapper.net.

ns7.kapper.net/144.217.92.144
        kapper.net.     3600    IN      MX      10 inbound.kapper.net.

ns8.kapper.net/195.200.6.20
        kapper.net.     3600    IN      MX      10 inbound.kapper.net.
   6.09 INFO     Zone09         Mail targets in the MX RRset "inbound.kapper.net." returned from name servers "94.136.1.127;103.241.67.58;144.217.92.144;195.200.6.20;97.74.83.192;94.136.22.5;94.16.111.51;139.99.239.52".

I was able to reproduce the results with dig:

$ dig MX @94.16.111.51 kapper.net +nord

; <<>> DiG 9.18.24-1-Debian <<>> MX @94.16.111.51 kapper.net +nord
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 52322
;; flags: qr aa; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 3

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
;; QUESTION SECTION:
;kapper.net.                    IN      MX

;; ANSWER SECTION:
kapper.net.             3586    IN      MX      10 inbound.kapper.net.

;; ADDITIONAL SECTION:
inbound.kapper.net.     3586    IN      A       94.136.1.122
inbound.kapper.net.     886     IN      AAAA    2a02:ab8:4::107

;; Query time: 29 msec
;; SERVER: 94.16.111.51#53(94.16.111.51) (UDP)
;; WHEN: Tue Jul 23 12:18:22 CEST 2024
;; MSG SIZE  rcvd: 107
$ dig MX @139.99.239.52 kapper.net +nord

; <<>> DiG 9.18.24-1-Debian <<>> MX @139.99.239.52 kapper.net +nord
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 44864
;; flags: qr aa; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 3

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
;; QUESTION SECTION:
;kapper.net.                    IN      MX

;; ANSWER SECTION:
kapper.net.             3578    IN      MX      10 inbound.kapper.net.

;; ADDITIONAL SECTION:
inbound.kapper.net.     3578    IN      A       94.136.1.122
inbound.kapper.net.     878     IN      AAAA    2a02:ab8:4::107

;; Query time: 269 msec
;; SERVER: 139.99.239.52#53(139.99.239.52) (UDP)
;; WHEN: Tue Jul 23 12:20:03 CEST 2024
;; MSG SIZE  rcvd: 107

It seems that doing consecutive DNS queries to these name servers reduces the TTL in the MX records each time:

$ date && dig MX @139.99.239.52 kapper.net +nord +noall +answer
Tue Jul 23 12:25:05 PM CEST 2024
kapper.net.             3276    IN      MX      10 inbound.kapper.net.

$ date && dig MX @139.99.239.52 kapper.net +nord +noall +answer
Tue Jul 23 12:25:10 PM CEST 2024
kapper.net.             3271    IN      MX      10 inbound.kapper.net.

$ date && dig MX @139.99.239.52 kapper.net +nord +noall +answer
Tue Jul 23 12:25:11 PM CEST 2024
kapper.net.             3270    IN      MX      10 inbound.kapper.net.

@anandb-ripencc I'll respond to this zone's administrator by email, since he contacted us directly too.

anandb-ripencc commented 2 months ago

Thanks for looking into this Tom, and for figuring out that the TTLs are different. I will also respond to him. It may be that we're getting these responses out of a cache or some kind, and the problem isn't in Zonemaster.

matsduf commented 2 months ago

We should either ignore the TTL or split the check into TTL and RDATA, respectively. The message only mentions RDATA which is misleading.

tgreenx commented 2 months ago

Thanks for looking into this Tom, and for figuring out that the TTLs are different. I will also respond to him. It may be that we're getting these responses out of a cache or some kind, and the problem isn't in Zonemaster.

So after all it appears that there is indeed a bug in the implementation of the name servers of that zone regarding the TTL value of resource records. Specifically, the value of the TTL of any resource record appears to be the TTL value from the name server own response cache:

$ date && dig MX @139.99.239.52 kapper.net +nord +noall +answer
Tue Jul 23 01:57:09 PM CEST 2024
kapper.net.             3600    IN      MX      10 inbound.kapper.net.

$ date && dig MX @139.99.239.52 kapper.net +nord +noall +answer
Tue Jul 23 01:57:11 PM CEST 2024
kapper.net.             3598    IN      MX      10 inbound.kapper.net.

$ date && dig MX @139.99.239.52 kapper.net +nord +noall +answer
Tue Jul 23 01:59:09 PM CEST 2024
kapper.net.             3480    IN      MX      10 inbound.kapper.net.

As you can see, for each passing second of time (as seen with the command date), the TTL value in the resource record as returned by the name server decreases equivalently. It seems to hold true for any type of resource records (not just MX), and most (if not all) name servers of that zone.

tgreenx commented 2 months ago

We should either ignore the TTL or split the check into TTL and RDATA, respectively. The message only mentions RDATA which is misleading.

To be exact, for that test case implementation (Zone09) all fields in the resource record are used, so also the owner name, type, class, and RDLENGTH are used to make the hash. See:

https://github.com/zonemaster/zonemaster-engine/blob/b889bcbbb5b2ca997f95e5e08e3167b48b975495/lib/Zonemaster/Engine/Test/Zone.pm#L1335-L1349

matsduf commented 2 months ago

To be exact, for that test case (Zone09) all fields in the resource record are used, so also the owner name, type, class, and RDLENGTH are used to make the hash. See:

The wording in the Zone09 test case specification is unfortunately not correctly written. I think it was meant to compare the RDATA but as it is written all data including TTL is compared. The specification should be updated by either limiting to RDATA or by splitting TTL and RDATA check.

After that the implementation should be updated.

tgreenx commented 2 months ago

The wording in the Zone09 test case specification is unfortunately not correctly written. I think it was meant to compare the RDATA but as it is written all data including TTL is compared. The specification should be updated by either limiting to RDATA or by splitting TTL and RDATA check.

Yes, we can improve the test case in that regard. Although If we decide to not limit it to just RDATA, we shouldn't stop with the TTL. With the same logic, I think that other fields become as relevant too.

matsduf commented 2 months ago

Yes, we can improve the test case in that regard. Although If we decide to not limit it to just RDATA, we shouldn't stop with the TTL. With the same logic, I think that other fields become as relevant too.

Class is not explicitly checked, but if the MX record (or records) in the answer section does not have the same owner name as the zone name it is ignored, as the specification is written.

hknet commented 2 months ago

Thx for figuring this out! Indeed there is DDoS protection in front of these auth-servers - I guess this is where the different TTL is coming from. Will investigate this further if we can manipulate the responses, though I'm not sure we will be able to get this changed.

hknet commented 2 months ago

funfact it's been a bug in the ddos-protection - on our side it's fixed - thx again for your effort!

tgreenx commented 2 months ago

No problem, glad it's fixed ! I will close this issue then.