pi-hole / FTL

The Pi-hole FTL engine
https://pi-hole.net
Other
1.37k stars 197 forks source link

Non-FQDN resolving/conditional forwarding doesn't work properly with two search domains configured #2085

Open kaechele opened 1 month ago

kaechele commented 1 month ago

Versions

$ pihole -v
Core
    Version is v5.18.3-457-ga8d305d5 (Latest: null)
    Branch is development-v6
    Hash is a8d305d5 (Latest: a8d305d5)
Web
    Version is v5.21-929-g085c2880 (Latest: null)
    Branch is development-v6
    Hash is 085c2880 (Latest: 085c2880)
FTL
    Version is vDev-e5a24bd (Latest: null)
    Branch is development-v6
    Hash is e5a24bdd (Latest: e5a24bdd)

Platform

Expected behavior

When two search domains are configured on a client and more than one conditional forwarder is configured in Pi-Hole, Pi-Hole should respond NXDOMAIN for those domains instead of blocking them as Blocked (external, NXRA) and responding 0.0.0.0 / ::. Not responding with NXDOMAIN will result in the client not attempting to resolve the non-FQDN hostname using the second configured search domain.

Actual behavior / bug

Pihole responds A 0.0.0.0 / AAAA :: to the non-FQDN query for host1:

$ host host1
host1.domain1.lan has address 0.0.0.0
host1.domain1.lan has IPv6 address ::

Therefor the client never tries to resolve host1.domain2.lan, which would trigger conditional forwarding to the other internal DNS server that has a valid entry for host1.domain2.lan that satisfies this request.

Steps to reproduce

Scenario:

The client has the following DNS settings:

$ resolvectl
Link 1 (wlp2s0)
    Current Scopes: DNS LLMNR/IPv4 LLMNR/IPv6
         Protocols: +DefaultRoute LLMNR=resolve -mDNS -DNSOverTLS DNSSEC=no/unsupported
Current DNS Server: 10.0.0.40
       DNS Servers: 10.0.0.40
        DNS Domain: domain1.lan domain2.lan

Configuration for Pi-Hole under Settings -> DNS -> Conditional Forwarding

true,10.0.0.0/24,10.0.0.10,domain1.lan
true,192.168.0.0/24,192.168.0.10,domain2.lan

Steps to reproduce the behavior:

  1. User tries to query the non-FQDN host host1
  2. The client expands this to host1.domain1.lan due to the search domain setting of domain1.lan domain2.lan
  3. Pi-Hole receives a query for host1.domain1.lan
  4. The first DNS server (10.0.0.10) that receives this request due to conditional forwarding does not have a valid RRSet for this domain
  5. Pi-Hole receives the NXDOMAIN from 10.0.0.10 and decides to block the request as it doesn't allow this request to be forwarded to the internet
  6. The client receives a A 0.0.0.0 / AAAA :: response from Pi-Hole and is satisfied. Had it received an NXDOMAIN response it would have tried querying host1.domain2.lan, which would have yielded the desired response.

Debug Token

github-actions[bot] commented 6 days ago

This issue is stale because it has been open 30 days with no activity. Please comment or update this issue or it will be closed in 5 days.

PromoFaux commented 4 days ago

Do you see the same behaviour if you set the blocking mode to NXDOMAIN rather than NULL?

kaechele commented 3 days ago

When I change the blocking mode to NXDOMAIN the behaviour changes to working as intended:

PromoFaux commented 3 days ago

Thanks for the update.

@DL6ER any thoughts here?

DL6ER commented 3 days ago

@kaechele This is not necessarily a setup I can easily reproduce here but let me start with asking if is this still an issue with the most recent development-v6 ? I recall us having fixed something concerning the detection of the external blocked status a few weeks ago, this may have coincided with your issue ticket which I unfortunately missed myself. I will move this to the right repository.

If it still exists with your previous configuration (which may be the case), please run

sudo pihole-FTL --config debug.queries true

and try again the host host1 on your client. The related content in /var/log/pihole/FTL.log should give us a better picture of what is going on here (and hopefully why FTL seems to have detected an upstream blocking attempt with NXRA).

kaechele commented 2 days ago

I'm pretty sure the culprit is this: https://github.com/pi-hole/FTL/blob/61a211f1c187206f5ff901afae657968114fde15/src/dnsmasq_interface.c#L2617-L2626

Context

I reverted dns.blocking.mode back to NULL (the default) and set debug.queries to true to capture the following log:

Query Log for host1 (non-FQDN) ``` 2024-10-17 02:33:13.758 UTC [1023M] DEBUG_QUERIES: **** new UDP IPv4 query[A] query "host1.domain1.lan" from eth0/10.0.0.151#58470 (ID 9977176, FTL 84021, src/dnsmasq/forward.c:1815) 2024-10-17 02:33:13.758 UTC [1023M] DEBUG_QUERIES: host1.domain1.lan is not known 2024-10-17 02:33:13.766 UTC [1023M] DEBUG_QUERIES: Checking if "host1.domain1.lan" is in antigravity (exact): no 2024-10-17 02:33:13.766 UTC [1023M] DEBUG_QUERIES: Checking if "host1.domain1.lan" is in gravity (exact): no 2024-10-17 02:33:13.767 UTC [1023M] DEBUG_QUERIES: DNS cache: A/10.0.0.151/host1.domain1.lan is not blocked (domainlist ID: -1) 2024-10-17 02:33:13.767 UTC [1023M] DEBUG_QUERIES: **** forwarded host1.domain1.lan to 10.0.0.10#53 (ID 9977176, src/dnsmasq/forward.c:559) 2024-10-17 02:33:13.769 UTC [1023M] DEBUG_QUERIES: **** host1.domain1.lan externally blocked (ID 9977176, FTL 84021, /app/src/dnsmasq/rfc1035.c:797) 2024-10-17 02:33:13.769 UTC [1023M] DEBUG_QUERIES: DNS cache: A/10.0.0.151/host1.domain1.lan is blocked upstream with NXDOMAIN and unset RA bit, expires in 86017s 2024-10-17 02:33:13.769 UTC [1023M] DEBUG_QUERIES: Set reply to NXDOMAIN (2) in src/dnsmasq_interface.c:2731 2024-10-17 02:33:13.769 UTC [1023M] DEBUG_QUERIES: **** got upstream reply from 10.0.0.10#53: host1.domain1.lan is blocked due to upstream response (header) (ID 9977176, src/dnsmasq/rfc1035.c:802) 2024-10-17 02:33:13.770 UTC [1023M] DEBUG_QUERIES: Preparing reply for "host1.domain1.lan", EDE: N/A (-1) 2024-10-17 02:33:13.770 UTC [1023M] DEBUG_QUERIES: Adding RR: "host1.domain1.lan A 0.0.0.0" 2024-10-17 02:33:13.770 UTC [1023M] DEBUG_QUERIES: **** got cache reply: host1.domain1.lan is 0.0.0.0 (ID 9977176, src/dnsmasq_interface.c:404) 2024-10-17 02:33:13.778 UTC [1023M] DEBUG_QUERIES: **** new UDP IPv4 query[AAAA] query "host1.domain1.lan" from eth0/10.0.0.151#45799 (ID 9977177, FTL 84022, src/dnsmasq/forward.c:1815) 2024-10-17 02:33:13.779 UTC [1023M] DEBUG_QUERIES: host1.domain1.lan is not known 2024-10-17 02:33:13.779 UTC [1023M] DEBUG_QUERIES: Checking if "host1.domain1.lan" is in antigravity (exact): no 2024-10-17 02:33:13.779 UTC [1023M] DEBUG_QUERIES: Checking if "host1.domain1.lan" is in gravity (exact): no 2024-10-17 02:33:13.779 UTC [1023M] DEBUG_QUERIES: DNS cache: AAAA/10.0.0.151/host1.domain1.lan is not blocked (domainlist ID: -1) 2024-10-17 02:33:13.780 UTC [1023M] DEBUG_QUERIES: **** forwarded host1.domain1.lan to 10.0.0.10#53 (ID 9977177, src/dnsmasq/forward.c:559) 2024-10-17 02:33:13.781 UTC [1023M] DEBUG_QUERIES: **** host1.domain1.lan externally blocked (ID 9977177, FTL 84022, /app/src/dnsmasq/rfc1035.c:797) 2024-10-17 02:33:13.781 UTC [1023M] DEBUG_QUERIES: DNS cache: AAAA/10.0.0.151/host1.domain1.lan is blocked upstream with NXDOMAIN and unset RA bit, expires in 86017s 2024-10-17 02:33:13.781 UTC [1023M] DEBUG_QUERIES: Set reply to NXDOMAIN (2) in src/dnsmasq_interface.c:2731 2024-10-17 02:33:13.782 UTC [1023M] DEBUG_QUERIES: **** got upstream reply from 10.0.0.10#53: host1.domain1.lan is blocked due to upstream response (header) (ID 9977177, src/dnsmasq/rfc1035.c:802) 2024-10-17 02:33:13.782 UTC [1023M] DEBUG_QUERIES: Preparing reply for "host1.domain1.lan", EDE: N/A (-1) 2024-10-17 02:33:13.782 UTC [1023M] DEBUG_QUERIES: Adding RR: "host1.domain1.lan AAAA ::" 2024-10-17 02:33:13.782 UTC [1023M] DEBUG_QUERIES: **** got cache reply: host1.domain1.lan is :: (ID 9977177, src/dnsmasq_interface.c:439) 2024-10-17 02:33:13.787 UTC [1023M] DEBUG_QUERIES: **** new UDP IPv4 query[MX] query "host1.domain1.lan" from eth0/10.0.0.151#44066 (ID 9977178, FTL 84023, src/dnsmasq/forward.c:1815) 2024-10-17 02:33:13.788 UTC [1023M] DEBUG_QUERIES: host1.domain1.lan is not known 2024-10-17 02:33:13.788 UTC [1023M] DEBUG_QUERIES: Checking if "host1.domain1.lan" is in antigravity (exact): no 2024-10-17 02:33:13.788 UTC [1023M] DEBUG_QUERIES: Checking if "host1.domain1.lan" is in gravity (exact): no 2024-10-17 02:33:13.788 UTC [1023M] DEBUG_QUERIES: DNS cache: MX/10.0.0.151/host1.domain1.lan is not blocked (domainlist ID: -1) 2024-10-17 02:33:13.789 UTC [1023M] DEBUG_QUERIES: **** forwarded host1.domain1.lan to 10.0.0.10#53 (ID 9977178, src/dnsmasq/forward.c:559) 2024-10-17 02:33:13.790 UTC [1023M] DEBUG_QUERIES: **** host1.domain1.lan externally blocked (ID 9977178, FTL 84023, /app/src/dnsmasq/rfc1035.c:797) 2024-10-17 02:33:13.791 UTC [1023M] DEBUG_QUERIES: DNS cache: MX/10.0.0.151/host1.domain1.lan is blocked upstream with NXDOMAIN and unset RA bit, expires in 86017s 2024-10-17 02:33:13.791 UTC [1023M] DEBUG_QUERIES: Set reply to NXDOMAIN (2) in src/dnsmasq_interface.c:2731 2024-10-17 02:33:13.791 UTC [1023M] DEBUG_QUERIES: **** got upstream reply from 10.0.0.10#53: host1.domain1.lan is blocked due to upstream response (header) (ID 9977178, src/dnsmasq/rfc1035.c:802) 2024-10-17 02:33:13.791 UTC [1023M] DEBUG_QUERIES: Preparing reply for "host1.domain1.lan", EDE: N/A (-1) 2024-10-17 02:33:13.791 UTC [1023M] DEBUG_QUERIES: **** got cache reply: host1.domain1.lan is (NODATA) (ID 9977178, src/dnsmasq_interface.c:457) ```

My read of what's happening here:

In this case the upstream server behaves correctly, because it doesn't have an entry for this host but it also cannot do recursion. It also doesn't need to, because it is authoritative for domain1.lan.

This is what the query looks like towards 10.0.0.10 using dig:

; <<>> DiG 9.18.28 <<>> host1.domain1.lan @10.0.0.10
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 50978
;; flags: qr aa rd; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
;; QUESTION SECTION:
;host1.domain1.lan. IN  A

;; AUTHORITY SECTION:
domain1.lan.    3600    IN  SOA dns.domain1.lan. hostmaster.domain1.lan. 2024101608 10800 3600 604800 3600

;; Query time: 22 msec
;; SERVER: 10.0.0.10#53(10.0.0.10) (UDP)
;; WHEN: Wed Oct 16 22:53:51 EDT 2024
;; MSG SIZE  rcvd: 111

I believe the root cause here is that PiHole needs to only consider a domain blocked upstream if both the RA and the AA bit are not set. If the AA bit is set PiHole should treat any NXDOMAIN response as authoritatively non-existent rather than blocked.

For comparison, here is a response from 9.9.9.9 for a known Malware domain that this server blocks:

; <<>> DiG 9.18.28 <<>> 1312services.ru @9.9.9.9
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 29040
;; flags: qr rd ad; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
;; QUESTION SECTION:
;1312services.ru.       IN  A

;; Query time: 29 msec
;; SERVER: 9.9.9.9#53(9.9.9.9) (UDP)
;; WHEN: Wed Oct 16 23:00:36 EDT 2024
;; MSG SIZE  rcvd: 44

No RA bit but also no AA bit. It's probably fine to continue considering this type of response as "blocked externally".

DL6ER commented 2 days ago

Thank you, this is about what I was assuming. Also thank you very much for the proposed fix already :-)

I will review/verify this after returning from work today (it's still earlyish morning on this side of the planet)