pi-hole / FTL

The Pi-hole FTL engine
https://pi-hole.net
Other
1.38k stars 196 forks source link

CNAME / A / AAAA Race Condition in FTL #1623

Closed NickJLange closed 1 year ago

NickJLange commented 1 year ago

Versions

Platform

Expected behavior

Queries for local-zone entries will correctly resolve regardless of CNAME vs A vs AAAA query via IPv6 or IPv4 or type-queries.

Actual behavior / bug

Some form of race-condition occurs inside FTL that will get NODATA responses when, in fact, there is data from the upstream for that type

Copy/pasted from:

Aug 12 21:03:27 dnsmasq[240]: query[A] nodered.myhouse.privatedomain.com from 192.168.1.23
Aug 12 21:03:27 dnsmasq[240]: forwarded nodered.myhouse.privatedomain.com to 192.168.1.112#5335
Aug 12 21:03:27 dnsmasq[240]: reply nodered.myhouse.privatedomain.com is <CNAME>
Aug 12 21:03:27 dnsmasq[240]: reply cnamed.myhouse.privatedomain.com is NODATA-IPv4
Aug 12 21:03:31 dnsmasq[240]: query[A] cnamed.myhouse.privatedomain.com from 0001:0000:0000:4a00:8d65:869b:55ca:0001
Aug 12 21:03:31 dnsmasq[240]: cached terraDelta.myhouse.privatedomain.com is NODATA-IPv4

``` Upstream Unbound Response
$ dig cnamed.myhouse.privatedomain.com @192.168.1.112 -p 5335
;; QUESTION SECTION:
;cnamed.myhouse.privatedomain.com. IN A

;; ANSWER SECTION:
cnamed.myhouse.privatedomain.com. 3600 IN A 192.168.1.104

$ dig cnamed.myhouse.privatedomain.com @192.168.1.110 -p 5335

;; QUESTION SECTION:
;cnamed.myhouse.privatedomain.com. IN A

;; ANSWER SECTION:
cnamed.myhouse.privatedomain.com. 3600 IN A 192.168.1.104
root@cnamed2:~# cat /etc/unbound/unbound.conf.d/localzone-myhouse.privatedomain.com.conf
server:
    ###########################################################################
    # LOCAL ZONE
    ###########################################################################

    local-zone: "myhouse.privatedomain.com." transparent
    ### <<< snip >>>

    local-data: "cnamed.myhouse.privatedomain.com. IN A 192.168.1104"
    local-data-ptr: "192.168.100.104 cnamed.myhouse.privatedomain.com."
    local-data: "canmed.myhouse.privatedomain.com. TXT 'Cheese' "
    local-data: "nodered.myhouse.privatedomain.com. CNAME cnamed.myhouse.privatedomain.com."
12:32:08.269173 IP6 (flowlabel 0x60600, hlim 64, next-header UDP (17) payload length: 59)  0001:0000:0000:4a00:8d65:869b:55ca:0001.60470 >  0001:0000:0000:4a00:8d65:869b:55ca:0002.53: [udp sum ok] 46943+ A? supera.myhouse.private.com. (51)
12:32:08.269335 IP6 (flowlabel 0x70d00, hlim 64, next-header UDP (17) payload length: 59) 2 0001:0000:0000:4a00:8d65:869b:55ca:0001.54115 >  0001:0000:0000:4a00:8d65:869b:55ca:0002.53: [udp sum ok] 29260+ AAAA? supera.myhouse.private.com. (51)
12:32:08.274608 IP6 (flowlabel 0x8de5d, hlim 64, next-header UDP (17) payload length: 75) 0001:0000:0000:4a00:8d65:869b:55ca:0002.53 > 0001:0000:0000:4a00:8d65:869b:55ca:0001.60470: [udp sum ok] 46943 q: A? supera.myhouse.private.com. 1/0/0 supera.myhouse.private.com. A 192.168.1.110 (67)
12:32:08.278580 IP6 (flowlabel 0xc472b, hlim 64, next-header UDP (17) payload length: 59) 0001:0000:0000:4a00:8d65:869b:55ca:0002.53 > 0001:0000:0000:4a00:8d65:869b:55ca:0001.54115: [udp sum ok] 29260* q: AAAA? supera.myhouse.private.com. 0/0/0 (51)

Steps to reproduce

Steps to reproduce the behavior:

  1. The corrupted entries only seem to be those with CNAME entries - PTR entries for those IPs are also unaffected
  2. Unknown what sequence exactly to reproduce.
  3. Waiting for TTL expiration / bouncing pihole seems to restore service on clients once in a bad-state
  4. IP6/IP4 in use may be exacerbating things (not proven)

Debug Token

pralor-bot commented 1 year ago

This issue has been mentioned on Pi-hole Userspace. There might be relevant details there:

https://discourse.pi-hole.net/t/pihole-unbound-localzone-incorrectly-nodata-ipv4-until-restarted-and-then-repoisoned/64383/5

DL6ER commented 1 year ago

Sorry for our first reply having taken so long. I had a quick look about two weeks ago but missed an important details: It is a dnsmasq not a (at least not a "native") pihole-FTL issue. Unfortunately, we won't be able to do much about it and I as much as I really hate to say "not my business", I still feel you should speak up at https://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss - especially since you already have lots of details available (esp. the unbound logs showing what is going on).

One thing you could do beforehand would be enabling the embedded pcap packet dumping. I'll reply to you about this in the linked Discourse discussion as things can be formatted more nicely over there. I will furthermore mark this ticket as an external issue but we can still leave it open until we incorporated the fix from dnsmasq upstream once ready.

Bucking-Horn commented 1 year ago

The Discourse topic discussion revealed that NickJLange runs two instances of unbound.

The unbound configuration as shared above looks faulty, which may have contributed to NickJLange's observation, particularly if only one of the unbound instances has been using that configuration:

root@cnamed2:~# cat /etc/unbound/unbound.conf.d/localzone-myhouse.privatedomain.com.conf (...) local-data: "cnamed.myhouse.privatedomain.com. IN A 192.168.1104" local-data: "canmed.myhouse.privatedomain.com. TXT 'Cheese' " local-data: "nodered.myhouse.privatedomain.com. CNAME cnamed.myhouse.privatedomain.com."

192.168.1104 is not a valid IP address, and canmed seems like a twisted cnamed.

Bucking-Horn commented 1 year ago

It seems that -being a recursive resolver rather than an authoritative one- unbound does not expand CNAMEs from local definitions, which would prompt the behaviour you observe.

See e.g. https://lists.nlnetlabs.nl/pipermail/unbound-users/2009-March/000509.html and https://github.com/NLnetLabs/unbound/issues/132

NickJLange commented 1 year ago

Thank you for your time and effort on pihole.

So that's sort of interesting - I'm using the container version of pihole. Just to make sure I'm not missing something - should FTL be logging under a different name? At any rate, I can raise an issue upstream.

I'll get a pcap over the weekend and update.

root@terraOmega-pihole:/var/log/pihole# grep -ri dnsmasq * | head -n 5
pihole.log:Sep 14 00:00:04 dnsmasq[240]: query[A] compute.googleapis.com from 192.168.100.110
pihole.log:Sep 14 00:00:04 dnsmasq[240]: forwarded compute.googleapis.com to 192.168.100.112#5335
pihole.log:Sep 14 00:00:04 dnsmasq[240]: query[AAAA] compute.googleapis.com from 192.168.100.110
pihole.log:Sep 14 00:00:04 dnsmasq[240]: forwarded compute.googleapis.com to 192.168.100.112#5335
pihole.log:Sep 14 00:00:04 dnsmasq[240]: reply compute.googleapis.com is 142.250.81.234

The Discourse topic discussion revealed that NickJLange runs two instances of unbound.

The unbound configuration as shared above looks faulty, which may have contributed to NickJLange's observation, particularly if only one of the unbound instances has been using that configuration:

root@cnamed2:~# cat /etc/unbound/unbound.conf.d/localzone-myhouse.privatedomain.com.conf (...) local-data: "cnamed.myhouse.privatedomain.com. IN A 192.168.1104" local-data: "canmed.myhouse.privatedomain.com. TXT 'Cheese' " local-data: "nodered.myhouse.privatedomain.com. CNAME cnamed.myhouse.privatedomain.com."

192.168.1104 is not a valid IP address, and canmed seems like a twisted cnamed.

Alas, this was just a sanitization-typo (and a failed one at that!)

It seems that -being a recursive resolver rather than an authoritative one- unbound does not expand CNAMEs from local definitions, which would prompt the behaviour you observe. See e.g. https://lists.nlnetlabs.nl/pipermail/unbound-users/2009-March/000509.html and https://github.com/NLnetLabs/unbound/issues/132

This looks more promising although it doesn't explain that unbound is correctly resolving when asked directly.

Bucking-Horn commented 1 year ago

I have reproduced your issue in my environment, which spurred me on to researching unbound's behaviour (see also my more detailed comment in the original Discourse issue from last week).

Your dig above requests the A record domain (cnamed), not the CNAME domain (nodered). Both unbound as well as Pi-hole will answer that request with an IP.

However, unbound will answer the request for the CNAME domain (nodered) with the CNAME record - it does not expand CNAMEs:

~ $ dig nodered.myhouse.lan @127.0.0.1 -p5335

; <<>> DiG 9.11.5-P4-5.1+deb10u8-Raspbian <<>> nodered.myhouse.lan @127.0.0.1 -p5335
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 58722
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
;; QUESTION SECTION:
;nodered.myhouse.lan.           IN      A

;; ANSWER SECTION:
nodered.myhouse.lan.    3600    IN      CNAME   cnamed.myhouse.lan.

;; Query time: 0 msec
;; SERVER: 127.0.0.1#5335(127.0.0.1)
;; WHEN: Thu Sep 14 17:34:27 CEST 2023
;; MSG SIZE  rcvd: 69

Consequently, this is the same answer that Pi-hole is giving you (of course):

~ $ dig nodered.myhouse.lan

; <<>> DiG 9.11.5-P4-5.1+deb10u8-Raspbian <<>> nodered.myhouse.lan
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 57924
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
;; QUESTION SECTION:
;nodered.myhouse.lan.           IN      A

;; ANSWER SECTION:
nodered.myhouse.lan.    30      IN      CNAME   cnamed.myhouse.lan.

;; Query time: 1 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Thu Sep 14 17:34:03 CEST 2023
;; MSG SIZE  rcvd: 69

This is also in line with your shared log excerpts from the Discourse discussion, which prompted me to point out that

Your log output suggests that Pi-hole has received that reply from your unbound.

It is a deliberate decision of unbound maintainer's to not expand those CNAMEs (as a guard against potential malicious intents like DNS cache poisoning, if I understood those discussions correctly that I have linked above).

So overall, I don't think this can be considered to be a bug, neither in pihole-FTL nor in dnsmasq.

NickJLange commented 1 year ago

Hi @Bucking-Horn - Thanks for spending some time here to replicate the behaviour. I agree with the premise that this behaviour is what the unbound product chose as intended behaviour of the software. Let me close the GI issue and we can move back to Discourse.

Thank you!