Closed BreiteSeite closed 2 years ago
My analysis powers are severely limited right now but I just checked via remote SSH that this domain does indeed work from my Pi-hole with a local unbound
instance as upstream resolver. I don't have cloudflare available myself.
Can you quote the corresponding lines from your log file /var/log/pi-hole.log
please?
Screenshots I just generated from mobile:
Thank you for that quick response.
I'm not sure if the upstream is responsible for this.
My /var/log/pihole.log:
Dec 27 18:23:40 dnsmasq[519]: query[A] careers.intuitive.com from 10.0.2.27
Dec 27 18:23:40 dnsmasq[519]: forwarded careers.intuitive.com to 172.17.0.1
Dec 27 18:23:40 dnsmasq[519]: dnssec-query[DNSKEY] phenompeople.com to 172.17.0.1
Dec 27 18:23:40 dnsmasq[519]: reply careers.intuitive.com is <CNAME>
Dec 27 18:23:40 dnsmasq[519]: reply intuitive.phenompeople.com is <CNAME>
Dec 27 18:23:40 dnsmasq[519]: reply hubsite-prod13-62952224.us-east-1.elb.amazonaws.com is 34.205.21.19
Dec 27 18:23:40 dnsmasq[519]: reply hubsite-prod13-62952224.us-east-1.elb.amazonaws.com is 35.173.207.80
Dec 27 18:23:40 dnsmasq[1888]: query[A] careers.intuitive.com from 10.0.2.27
If i change my upstream DNS to quad9, everything works but as soon as i change it back to my cloudflared resolver i trigger the bug. So it seems like a combination of dnsmasq/FTL and cloudflared.
Here is my swarm service config:
version: "3"
# More info at https://github.com/pi-hole/docker-pi-hole/ and https://docs.pi-hole.net/
services:
pihole:
image: pihole/pihole:latest
networks:
- "traefik"
environment:
TZ: 'Europe/Berlin'
volumes:
- 'etc-pihole:/etc/pihole'
- 'etc-dnsmasq:/etc/dnsmasq.d'
restart: unless-stopped
deploy:
labels:
- "traefik.enable=true"
- "traefik.tcp.routers.dnstcp.entrypoints=dnstcp"
- "traefik.tcp.routers.dnstcp.rule=HostSNI(`*`)"
- "traefik.tcp.services.pihole.loadbalancer.server.port=53"
- "traefik.udp.routers.dnsudp.entrypoints=dnsudp"
- "traefik.udp.services.pihole.loadbalancer.server.port=53"
- "traefik.http.services.pihole.loadbalancer.server.port=80"
- "traefik.http.services.pihole.loadbalancer.sticky.cookie=true"
- "traefik.http.routers.pihole.rule=Host(`my-pihole-hostnamei`)"
- "traefik.http.routers.pihole.service=pihole"
- "traefik.http.routers.pihole.entrypoints=https"
- "traefik.http.routers.pihole.tls=true"
cloudflared:
image: raspbernetes/cloudflared:latest
command: "proxy-dns --address 0.0.0.0 --port 5053 --upstream https://dns11.quad9.net/dns-query"
networks:
- "traefik"
deploy:
labels:
- "traefik.enable=true"
- "traefik.udp.routers.cloudflared.entrypoints=cloudflared"
- "traefik.udp.services.cloudflared.loadbalancer.server.port=5053"
networks:
traefik:
external: true
volumes:
etc-pihole:
etc-dnsmasq:
Could you record a similar pcap for the case where it works fine? Should make it easier to precisely compare the answer returned by cloudflare and the other resolver and what the corresponding replies from FTL to dig are.
Also the pihole log snippets in both cases, please.
Was there anything more with dnsmasq[1888]
in the log above? This looks like a TCP retry. Just for completeness sake.
I'll have a look at your files/logs as soon as possible.
I tested it dnsmasq directly (to see if this might be an dnsmasq bug) on my raspberrypi (without swarm etc.) and couldn't reproduce. But a lot of variables changed here as well (networking etc) so i'm not sure how useful this is..
version: '3.3'
services:
dnsmasq:
image: 4km3/dnsmasq:2.86-r0-alpine-edge
privileged: true
network_mode: host
command: "-d --log-queries --no-resolv --no-hosts -S 127.0.0.1#5059 -a 127.0.0.1 -p 5058 -y"
cloudflared:
image: raspbernetes/cloudflared:latest
network_mode: host
command: "proxy-dns --address 127.0.0.1 --port 5059 --upstream https://dns11.quad9.net/dns-query"
networks:
pihole-test:
ipam:
driver: default
config:
- subnet: 172.68.0.0/16
Was there anything more with dnsmasq[1888] in the log above? This looks like a TCP retry. Just for completeness sake.
No this appears to be all. It's a bit tricky obv. because of noise from other devices but this is appears to be everything for that query. I pressed return a couple of times in the tail to make sure i have a visual indicator of whats new and then just executed the dig command and copied the appearing block. I repeated this just to be sure and got the same block. Could it be that yours has more lines because it includes DNSSEC stuff that my instance might has cached? This is the result of my retry:
Dec 27 20:23:17 dnsmasq[418]: query[A] careers.intuitive.com from 10.0.2.26
Dec 27 20:23:17 dnsmasq[418]: forwarded careers.intuitive.com to 172.17.0.1
Dec 27 20:23:17 dnsmasq[418]: dnssec-query[DNSKEY] phenompeople.com to 172.17.0.1
Dec 27 20:23:17 dnsmasq[418]: reply careers.intuitive.com is <CNAME>
Dec 27 20:23:17 dnsmasq[418]: reply intuitive.phenompeople.com is <CNAME>
Dec 27 20:23:17 dnsmasq[418]: reply hubsite-prod13-62952224.us-east-1.elb.amazonaws.com is 34.205.21.19
Dec 27 20:23:17 dnsmasq[418]: reply hubsite-prod13-62952224.us-east-1.elb.amazonaws.com is 35.173.207.80
Dec 27 20:23:17 dnsmasq[2594]: query[A] careers.intuitive.com from 10.0.2.26
Dec 27 20:23:17 dnsmasq[2595]: query[A] careers.intuitive.com from 10.0.2.28
Could you record a similar pcap for the case where it works fine?
~bug-quad9-upstream.pcap.zip~ (removed)
Also the pihole log snippets in both cases, please.
Error-case see above, success case (direct resolving to quad9) below:
Dec 27 20:32:59 dnsmasq[2854]: query[A] careers.intuitive.com from 10.0.2.27
Dec 27 20:32:59 dnsmasq[2854]: forwarded careers.intuitive.com to 9.9.9.11
Dec 27 20:32:59 dnsmasq[2854]: validation result is INSECURE
Dec 27 20:32:59 dnsmasq[2854]: reply careers.intuitive.com is <CNAME>
Dec 27 20:32:59 dnsmasq[2854]: reply intuitive.phenompeople.com is <CNAME>
Dec 27 20:32:59 dnsmasq[2854]: reply hubsite-prod13-62952224.us-east-1.elb.amazonaws.com is 34.205.21.19
Dec 27 20:32:59 dnsmasq[2854]: reply hubsite-prod13-62952224.us-east-1.elb.amazonaws.com is 35.173.207.80
I'll have a look at your files/logs as soon as possible.
Thank you very much. Let me know how i can help.
I tested it dnsmasq directly
Did you apply the same config lines used in Pi-hole? We're currently at the bleeding edge of dnsmasq
development as this has some important fixes that aren't released officially as we're speaking. Otherwise, FTL does not influence DNS handling in the slightest.
Could you try some older versions of FTL (still had to be version v5.x) in your container, too?
This to make the comparison to the other dnsmasq release more fair.
v5.8.1
dig result:
pi@rpi:~ $ dig @127.0.0.1 careers.intuitive.com
;; Truncated, retrying in TCP mode.
; <<>> DiG 9.16.22-Debian <<>> @127.0.0.1 careers.intuitive.com
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: REFUSED, id: 38687
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;careers.intuitive.com. IN A
;; Query time: 3 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Mon Dec 27 20:45:27 CET 2021
;; MSG SIZE rcvd: 50
Dec 27 20:49:11 dnsmasq[857]: query[A] careers.intuitive.com from 10.0.2.26
Dec 27 20:49:11 dnsmasq[857]: forwarded careers.intuitive.com to 172.17.0.1
Dec 27 20:49:11 dnsmasq[857]: dnssec-query[DNSKEY] phenompeople.com to 172.17.0.1
Dec 27 20:49:11 dnsmasq[857]: reply careers.intuitive.com is <CNAME>
Dec 27 20:49:11 dnsmasq[857]: reply intuitive.phenompeople.com is <CNAME>
Dec 27 20:49:11 dnsmasq[857]: reply hubsite-prod13-62952224.us-east-1.elb.amazonaws.com is 34.205.21.19
Dec 27 20:49:11 dnsmasq[857]: reply hubsite-prod13-62952224.us-east-1.elb.amazonaws.com is 35.173.207.80
Dec 27 20:49:11 dnsmasq[1000]: query[A] careers.intuitive.com from 10.0.2.26
Dec 27 20:49:11 dnsmasq[1000]: config error is REFUSED
~bug-quad9-upstream-5.8.1.pcap.zip~ (removed)
image 2021.09 (FTL 5.9)
dig
pi@rpi:~ $ dig @127.0.0.1 careers.intuitive.com
;; Truncated, retrying in TCP mode.
;; communications error to 127.0.0.1#53: end of file
;; communications error to 127.0.0.1#53: end of file
Dec 27 20:53:21 dnsmasq[499]: query[A] careers.intuitive.com from 10.0.2.27
Dec 27 20:53:21 dnsmasq[499]: forwarded careers.intuitive.com to 172.17.0.1
Dec 27 20:53:21 dnsmasq[499]: dnssec-query[DNSKEY] phenompeople.com to 172.17.0.1
Dec 27 20:53:21 dnsmasq[499]: reply careers.intuitive.com is <CNAME>
Dec 27 20:53:21 dnsmasq[499]: reply intuitive.phenompeople.com is <CNAME>
Dec 27 20:53:21 dnsmasq[499]: reply hubsite-prod13-62952224.us-east-1.elb.amazonaws.com is 35.173.207.80
Dec 27 20:53:21 dnsmasq[499]: reply hubsite-prod13-62952224.us-east-1.elb.amazonaws.com is 34.205.21.19
Dec 27 20:53:21 dnsmasq[880]: query[A] careers.intuitive.com from 10.0.2.27
Dec 27 20:53:21 dnsmasq[881]: query[A] careers.intuitive.com from 10.0.2.26
~bug-quad9-upstream-5.9.pcap.zip~ (removed)
Sorry for spam but i hoped to increase overall clarity by bundling it into different comments.
So interestingly even version 5.8.1 can not resolve it according to dig whilst in the pcap you can see that actual addresses and CNAME records are returned.
Ninja edit:
Did you apply the same config lines used in Pi-hole?
No - all the configuration is passed via the command directly to the daemons as stated in the docker-compose.yml
.
Okay, I checked your files very quickly. Unfortunately, they don't contain the reply from the upstream resolver, only the traffic from dig to Pi-hole and back. It'd be helpful to get this in addition but I see that this can get tricky when there is a lot of traffic.
In both cases the DNS reply signals truncation, requesting retry over TCP. The only differences between them are:
where A is broken with cloudflared
, B is okay with Quad9
upstream. Nothing special here, it seems.
dig
retires over TCP as advised, both pcap
s contain the TCP retry from dig
. Now the interesting part: The TCP query is replied to in case B but never in case A. What happens here is that a retry over TCP also triggers a retry to upstream over TCP as information might have gone lost before (remember, the query arrived truncated already from upstream).
In the Quad9 case, the reply from upstream arrives and Pi-hole passes this on to your client. With cloudflared
, this reply seemingly never arrives back at your Pi-hole and, hence, isn't forwarded to your dig
client.
This is now the point where it is basically impossible to continue investigation when we don't have the recorded traffic to and from the upstream server. Could you maybe setup a separate container where the bug is still present but you can record the entire traffic as nobody else is using it? It could listen on a non-default port and dig
could be told to use this port instead.
Also worth noting before I forget about this is that your first comment shows that a direct query to cloudflared
does not want to retry over TCP due to truncation. This likely happens because Pi-hole is requesting additional content as you said DNSSED is enabled. It'll be very interesting to analyze the traffic from/to upstream.
Could you also try a DNSSEC-enabled query directly to cloudflared
to see if we get into the same truncation issue?
dig @127.0.0.1 -p 5053 +dnssec careers.intuitive.com
Okay - i'm sorry. This whole bug report is a total case of user-error. 😅
I setup port-forwarding via traefik to cloudflared but only configured a UDP entrypoint and router. Because if you want to listen for both in traefik you have to registered them twice - one for each protocol
This is why pihole could reach the cloudflare container depending on the protocol used and i guess some payload-sizes in UDP triggered the TCP retry which then failed because traefik never listened and routed that.
So sorry for this and thank you very much for your extensive replies which made it more clear what the actual problem here is.
I'm going to delete the pcap files from the issue report.
Sorry and thanks again.
Missing TCP traffic forwarding was my first thought but then I thought "well, he surely has thought about that!" ;-)
Hey, sorry to bother you again but i'm going crazy on this one. I'm not sure what changed. I accidentally rebooted my pi today and made some other changes unrelated to my pihole setup but it doesn't work anymore.
So.. setup is:
traefik exposes 53 (UDP+TCP) for pihole and 5053 (UDP+TCP) for cloudflared.
So the idea of data flow is
client -> raspberrypi (traefik) :53 ---forwards to---> pihole container ---request-upstream--> 172.17.0.1 (traefik/docker interface) :5053 --> cloudflared --> quad9 upstream
So... when i set 172.17.0.1#5053 as upstream in pihole it doesn't work.
root@rpi:~# dig @127.0.0.1 -p 53 +notcp +noignore bit.ly
;; Truncated, retrying in TCP mode.
; <<>> DiG 9.16.22-Debian <<>> @127.0.0.1 -p 53 +notcp +noignore bit.ly
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 23961
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
; COOKIE: 37148cae44fd561b (echoed)
; EDE: 9 (DNSKEY Missing)
;; QUESTION SECTION:
;bit.ly. IN A
;; Query time: 39 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Thu Dec 30 01:09:16 CET 2021
;; MSG SIZE rcvd: 53
root@rpi:~# dig @127.0.0.1 -p 53 +tcp +noignore bit.ly
; <<>> DiG 9.16.22-Debian <<>> @127.0.0.1 -p 53 +tcp +noignore bit.ly
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 32104
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
; COOKIE: 0383e948672f44f6 (echoed)
; EDE: 9 (DNSKEY Missing)
;; QUESTION SECTION:
;bit.ly. IN A
;; Query time: 131 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Thu Dec 30 01:10:27 CET 2021
;; MSG SIZE rcvd: 53
However, if i set dig to ignore truncation it works for UDP
root@rpi:~# dig @127.0.0.1 -p 53 +notcp +ignore bit.ly
; <<>> DiG 9.16.22-Debian <<>> @127.0.0.1 -p 53 +notcp +ignore bit.ly
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 8769
;; flags: qr aa tc rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
; COOKIE: f108bc12bc8bcc3e (echoed)
;; QUESTION SECTION:
;bit.ly. IN A
;; ANSWER SECTION:
bit.ly. 101 IN A 67.199.248.11
bit.ly. 101 IN A 67.199.248.10
;; Query time: 31 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Thu Dec 30 01:10:57 CET 2021
;; MSG SIZE rcvd: 91
but not TCP
root@rpi:~# dig @127.0.0.1 -p 53 +tcp +ignore bit.ly
; <<>> DiG 9.16.22-Debian <<>> @127.0.0.1 -p 53 +tcp +ignore bit.ly
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 8007
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
; COOKIE: 241a4d785ad4254f (echoed)
; EDE: 9 (DNSKEY Missing)
;; QUESTION SECTION:
;bit.ly. IN A
;; Query time: 75 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Thu Dec 30 01:11:26 CET 2021
;; MSG SIZE rcvd: 53
So i thought okay maybe the pihole <-> cloudflare-dns link is not working, but thats not the case (this is from inside the pihole container)
root@1c076d91aca4:/# dig @172.17.0.1 -p 5053 +tcp +ignore bit.ly
; <<>> DiG 9.11.5-P4-5.1+deb10u6-Debian <<>> @172.17.0.1 -p 5053 +tcp +ignore bit.ly
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 22631
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
; COOKIE: 00fe863cadaad34f (echoed)
;; QUESTION SECTION:
;bit.ly. IN A
;; ANSWER SECTION:
bit.ly. 281 IN A 67.199.248.11
bit.ly. 281 IN A 67.199.248.10
;; Query time: 1 msec
;; SERVER: 172.17.0.1#5053(172.17.0.1)
;; WHEN: Thu Dec 30 01:14:19 CET 2021
;; MSG SIZE rcvd: 91
(i left out the digs for +tcp +noignore and +udp +[no]ignore for brevity but they work.
So that somehow means the problem is between host and pihole? It's again only affecting some domains (i guess it's related to package truncation).
I'm actually not sure if it ever worked after my last comment, though the setup was definitive wrong before.
Also, when i change the upstream in pihole to quad9 directly instead of my cloudflared container, it works again.
Can you help me understand whats going on here? As this works depending on the [no]ignore flag i assume my network setup is correct (as proven by successful dig commands from host to pihole and from pihole to cloudflared).
Any help here is appreciated.
Additionally i have this warning in the webinterface:
Here is again a capture from the docker_gwbridge on the pi for dig @127.0.0.1 +notcp bit.ly
and dig @127.0.0.1 -p 5053 +notcp bit.ly
. (You can filter by port to show the specific sets). You can see that the query to cloudflared works, while the query to pihole has the server response error (2) set.
bit-ly-capture-on-docker-gwbridge.pcapng.gz
I also have a dump here from one of the many interfaces docker swarm creates - but i don't know which one that represents. but i think it's from host (docker0) <> pihole based from the IPs - not pihole <> cloudflared. bit-ly-capture-1.pcapng.gz
and here is another one but i'm also not familiar what communication this represents. 172.18.0.1 is docker_gwbridge so to the outside (host). I guess this is between traefik (which makes the port mapping) and some container. bit-ly-capture-veth561d07c.pcapng.gz
@DL6ER i creates this repo to reproduce so you can test it better on your machine: https://github.com/BreiteSeite/pihole-traefik-udp-bug
This behaves exactly as my setup:
pi@rpi:~/pihole-test $ dig @127.0.0.1 -p 5859 +notcp +noignore +short bit.ly
67.199.248.11
67.199.248.10
pi@rpi:~/pihole-test $ dig @127.0.0.1 -p 5858 +notcp +noignore +short bit.ly
pi@rpi:~/pihole-test $ dig @127.0.0.1 -p 5859 +notcp +noignore +short duck.com
52.142.124.215
pi@rpi:~/pihole-test $ dig @127.0.0.1 -p 5858 +notcp +noignore +short duck.com
52.142.124.215
(Note that i had to set the upstream to 172.19.0.1
otherwise i got hit with
root@8872025e5ae2:/# dig @172.17.0.1 -p 5859 +notcp +noignore bit.ly
;; reply from unexpected source: 172.19.0.1#5859, expected 172.17.0.1#5859
which appears to be a known traefik "bug": https://github.com/traefik/traefik/issues/7430
Could you give me some precise commands how to set up a test system inside, say, a Ubuntu VM given your repo? Sorry to ask this but I am myself more a friend of lxc-based virtualization and never actually used docker compose myself so I don't even know where to start and how to set up a system identical to yours. As this will (potentially) be a lot of work, it will also take some time as I have to sneak it in between other things.
What would be interesting, until I can reproduce your system, is the log (/var/log/pihole.log
) excerpt corresponding to your tests above. They are all replied to with SERVFAIL
. This can have many causes, e.g., SERVFAIL already comes from upstream or DNSSEC validation failed. If the upstream couldn't be connected, the reply might have been REFUSED instead (but, then again, not in every case).
Truncation seems unlikely because in the one case where you +notcp +ignore
the MSG SIZE rcvd: 91
is fairly low, however, the packet might have been much larger before when DNSSEC information was attached (actually, it shouldn't be that much larger).
Concerning the warning: This is a limitation on the upstream resolver you configured. Either your local or the upstream one. Check out this discussion: https://discourse.pi-hole.net/t/dnsmasq-warn-reducing-dns-packet-size/51803
Especially these posts:
Could you give me some precise commands how to set up a test system inside, say, a Ubuntu VM given your repo?
Sure thing. It should be fairly easy so i would assume apt install docker docker-compose
should do the trick for the installation part and the cd <repo>
and docker-compose up -d
to boot it. That should be it. If you're on some messaging platform (IRC/Telegram/Slack/Signal) i am also happy to help you directly in case you run into issues.
What would be interesting, until I can reproduce your system, is the log (/var/log/pihole.log) excerpt corresponding to your tests above.
Running the same commands in the same order as above - output spaced by newlines (port 5859 commands are directly to cloudflared so they don't appear in the pihole.log):
Dec 30 17:24:25 dnsmasq[446]: query[A] pi.hole from 127.0.0.1
Dec 30 17:24:25 dnsmasq[446]: Pi-hole hostname pi.hole is 0.0.0.0
Dec 30 17:24:32 dnsmasq[446]: query[A] bit.ly from 172.19.0.3
Dec 30 17:24:32 dnsmasq[446]: forwarded bit.ly to 172.19.0.1
Dec 30 17:24:33 dnsmasq[446]: dnssec-query[DS] ly to 172.19.0.1
Dec 30 17:24:33 dnsmasq[446]: reply ly is DS keytag 62311, algo 8, digest 2
Dec 30 17:24:33 dnsmasq[446]: dnssec-query[DS] bit.ly to 172.19.0.1
Dec 30 17:24:33 dnsmasq[446]: dnssec-query[DNSKEY] ly to 172.19.0.1
Dec 30 17:24:33 dnsmasq[446]: reply bit.ly is 67.199.248.10
Dec 30 17:24:33 dnsmasq[446]: reply bit.ly is 67.199.248.11
Dec 30 17:24:33 dnsmasq[511]: query[A] bit.ly from 172.19.0.3
Dec 30 17:24:33 dnsmasq[511]: forwarded bit.ly to 172.19.0.1
Dec 30 17:24:33 dnsmasq[511]: dnssec-query[DS] bit.ly to 172.19.0.1
Dec 30 17:24:33 dnsmasq[511]: dnssec-query[DNSKEY] ly to 172.19.0.1
Dec 30 17:24:33 dnsmasq[511]: validation bit.ly is BOGUS
Dec 30 17:24:33 dnsmasq[511]: reply bit.ly is 67.199.248.10
Dec 30 17:24:33 dnsmasq[511]: reply bit.ly is 67.199.248.11
Dec 30 17:24:56 dnsmasq[446]: query[A] pi.hole from 127.0.0.1
Dec 30 17:24:56 dnsmasq[446]: Pi-hole hostname pi.hole is 0.0.0.0
Dec 30 17:25:02 dnsmasq[446]: query[A] duck.com from 172.19.0.3
Dec 30 17:25:02 dnsmasq[446]: forwarded duck.com to 172.19.0.1
Dec 30 17:25:02 dnsmasq[446]: validation result is INSECURE
Dec 30 17:25:02 dnsmasq[446]: reply duck.com is 40.89.244.232
Starting from your repo works on a fresh Ubuntu 20.04 with latest docker installed:
Creating pihole-traefik-udp-bug_traefik_1 ... done
Creating pihole-traefik-udp-bug_cloudflared_1 ... done
Creating pihole-traefik-udp-bug_pihole_1 ... done
However, it does not really do what we expect it to (at least not out-of-the-box);
dominik@pihole-traefil-udp-bug:~$ dig @127.0.0.1 -p 5859 +notcp +noignore +short bit.ly
67.199.248.10
67.199.248.11
dominik@pihole-traefil-udp-bug:~$ dig @127.0.0.1 -p 5858 +notcp +noignore +short bit.ly
;; connection timed out; no servers could be reached
dominik@pihole-traefil-udp-bug:~$ dig @127.0.0.1 -p 5859 +notcp +noignore +short duck.com
52.142.124.215
dominik@pihole-traefil-udp-bug:~$ dig @127.0.0.1 -p 5858 +notcp +noignore +short duck.com
;; connection timed out; no servers could be reached
dominik@pihole-traefil-udp-bug:~$ sudo docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
c118b4e8e834 pihole/pihole:latest "/s6-init" 3 minutes ago Up 3 minutes (healthy) 53/udp, 53/tcp, 67/udp, 0.0.0.0:8300->80/tcp, :::8300->80/tcp pihole-traefik-udp-bug_pihole_1
18a86bf28e30 raspbernetes/cloudflared:latest "cloudflared --no-au…" 3 minutes ago Up 3 minutes pihole-traefik-udp-bug_cloudflared_1
f7b0301a20cd traefik:v2.5 "/entrypoint.sh --ap…" 3 minutes ago Up 3 minutes 80/tcp, 0.0.0.0:5858-5859->5858-5859/tcp, 0.0.0.0:5858-5859->5858-5859/udp, :::5858-5859->5858-5859/tcp, :::5858-5859->5858-5859/udp pihole-traefik-udp-bug_traefik_1
I'm running out of time for today. Any suggestions?
Pi-hole does seem to run:
dominik@pihole-traefil-udp-bug:~$ sudo docker exec c118b4e8e834 pihole status
[✓] DNS service is listening
[✓] UDP (IPv4)
[✓] TCP (IPv4)
[✓] UDP (IPv6)
[✓] TCP (IPv6)
[✓] Pi-hole blocking is enabled
dominik@pihole-traefil-udp-bug:~$ sudo docker exec c118b4e8e834 dig +short google.de
74.125.133.94
I'm running out of time for today. Any suggestions?
yes - you need to configure the IP of the cloudflared container as upstream in pihole. I guess on your setup it got a different IP than on mine assigned. I'm not sure how docker behaves in a VM as well. I think the easiest would be to just run the docker container on your host - i mean thats the idea of container - that it's kinda isolated from your host already anyways. :)
But if your not comfortable with that you can find the IP of the cloudflared container by running sudo docker inspect $(sudo docker-compose ps -q cloudflared)
. If you have jq
installed, this reduces the output a to the relevant section sudo docker inspect $(sudo docker-compose ps -q cloudflared) | jq '.[0].NetworkSettings.Networks'
(needs to be run in the directory of the repository)
The VM I'm using is a true virtualization, unlike the hybrid solutions such as docker
, the virtual operating system does not even know it is virtual. (obviously, at the extra costs of disk space, memory, etc.) I have several virtualized machines running on a server with enough RAM and simply ssh
into them from remote as they appear the same as bare-metal servers to the outside but with very simple backup/restore/discard/archive/resume capabilities. don't expect it to cause any issues.
Your command returns:
{
"pihole-traefik-udp-bug_default": {
"IPAMConfig": null,
"Links": null,
"Aliases": [
"cloudflared",
"18a86bf28e30"
],
"NetworkID": "e5a998204b9327b2ec271d160d41c5c6150644e0c435113b2158fd7e8eebe689",
"EndpointID": "773ee2178de15695c3ec5e58be99c05211f0df716fcb62fb3328f680208c7116",
"Gateway": "172.18.0.1",
"IPAddress": "172.18.0.4",
"IPPrefixLen": 16,
"IPv6Gateway": "",
"GlobalIPv6Address": "",
"GlobalIPv6PrefixLen": 0,
"MacAddress": "02:42:ac:12:00:04",
"DriverOpts": null
}
}
This doesn't seem to work, though:
dominik@pihole-traefil-udp-bug:~$ sudo docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
128ffdbec214 pihole/pihole:latest "/s6-init" 37 minutes ago Up 37 minutes (healthy) 53/udp, 67/udp, 0.0.0.0:5333->53/tcp, :::5333->53/tcp, 0.0.0.0:8300->80/tcp, :::8300->80/tcp pihole-traefik-udp-bug_pihole_1
18a86bf28e30 raspbernetes/cloudflared:latest "cloudflared --no-au…" 57 minutes ago Up 57 minutes pihole-traefik-udp-bug_cloudflared_1
f7b0301a20cd traefik:v2.5 "/entrypoint.sh --ap…" 57 minutes ago Up 57 minutes 80/tcp, 0.0.0.0:5858-5859->5858-5859/tcp, 0.0.0.0:5858-5859->5858-5859/udp, :::5858-5859->5858-5859/tcp, :::5858-5859->5858-5859/udp pihole-traefik-udp-bug_traefik_1
dominik@pihole-traefil-udp-bug:~$ sudo docker exec 128ffdbec214 dig @172.18.0.4 -p 5859 google.de
; <<>> DiG 9.11.5-P4-5.1+deb10u6-Debian <<>> @172.18.0.4 -p 5859 google.de
; (1 server found)
;; global options: +cmd
;; connection timed out; no servers could be reached
The VM I'm using is a true virtualisation, unlike the hybrid solutions such as docker, the virtual operating system does not even know it is virtual.
Okay - to be fair i'm not very familiar with VMs since it's been like 7 years that i used them. :) If you don't mind what do you mean by hybrid solution? Because docker is basically just a frontend for the process virtualisation - kinda similar to lxc just more user-friendly (IMHO ;)).
This doesn't seem to work
Sorry - i was actually wrong. The upstream IP shouldn't be that from the cloudflared container directly as the traffic should go to traefik which routes it to (one of the) cloudflared container then based on the entrypoint port (53 tcp+upd). So the IP you need to use is the one from the gateway - 172.18.0.1 in your example.
sudo docker-compose exec pihole dig @172.18.0.1 -p 5859 google.de
Should do the trick.
Thank you for your effort on this investigation.
Okay, so this work. TL;DR: Noting unexpected happens. The problem is upstream and not with Pi-hole.
Let's start with what you have done, too:
dominik@pihole-traefil-udp-bug:~/$ dig @127.0.0.1 -p 5858 +notcp +noignore bit.ly
;; Truncated, retrying in TCP mode.
; <<>> DiG 9.16.1-Ubuntu <<>> @127.0.0.1 -p 5858 +notcp +noignore bit.ly
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 55227
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
; COOKIE: 7870ad0f6f280043 (echoed)
; OPT=15: 00 09 ("..")
;; QUESTION SECTION:
;bit.ly. IN A
;; Query time: 79 msec
;; SERVER: 127.0.0.1#5858(127.0.0.1)
;; WHEN: Fri Dec 31 11:59:03 UTC 2021
;; MSG SIZE rcvd: 53
same result as you got. (interesting to note +notcp
is set but dig
is still "retrying in TCP mode")
dominik@pihole-traefil-udp-bug:~/$ sudo docker exec -it 128ffdbec214 tail /var/log/pihole.log
Dec 31 12:59:02 dnsmasq[6601]: query[A] bit.ly from 172.18.0.2
Dec 31 12:59:02 dnsmasq[6601]: forwarded bit.ly to 172.18.0.1
Dec 31 12:59:02 dnsmasq[6601]: dnssec-query[DS] bit.ly to 172.18.0.1
Dec 31 12:59:03 dnsmasq[6601]: dnssec-query[DNSKEY] ly to 172.18.0.1
Dec 31 12:59:03 dnsmasq[6601]: validation bit.ly is BOGUS
Dec 31 12:59:03 dnsmasq[6601]: reply bit.ly is 67.199.248.11
Dec 31 12:59:03 dnsmasq[6601]: reply bit.ly is 67.199.248.10
The DNSSEC validation returned that bit.ly
is BOGUS. Returning SERVFAIL is expected behavior in this case. Interestingly enough, this is wrong. I checked the DNSSEC path and bit.ly
is not actually using DNSSEC at all. So it should be INSECURE instead. As everything works with a different upstream in Pi-hole, I suspect that cloudflared
is somehow giving a corrupt response. Enabling log-queries=extra
in 01-pihole.conf
of the Pi-hole container gives further details:
Dec 31 13:09:21 dnsmasq[6974]: 10 172.18.0.2/49814 query[A] bit.ly from 172.18.0.2
Dec 31 13:09:21 dnsmasq[6974]: 10 172.18.0.2/49814 forwarded bit.ly to 172.18.0.1
Dec 31 13:09:21 dnsmasq[6974]: 11 dnssec-query[DS] bit.ly to 172.18.0.1
Dec 31 13:09:21 dnsmasq[6974]: 12 dnssec-query[DNSKEY] ly to 172.18.0.1
Dec 31 13:09:21 dnsmasq[6974]: 10 172.18.0.2/49814 validation bit.ly is BOGUS (EDE: DNSKEY missing)
Dec 31 13:09:21 dnsmasq[6974]: 10 172.18.0.2/49814 reply bit.ly is 67.199.248.10
Dec 31 13:09:21 dnsmasq[6974]: 10 172.18.0.2/49814 reply bit.ly is 67.199.248.11
(see the EDE message).
This is now a situation that is very difficult to handle as doing DNSSEC validation by hand is ... time-consuming (to put it mildly). Not something that is really feasible. So I recorded a pcap with filter port (53 or 5859)
in the pi-hole container: dns.zip
Check where my mouse is in the following screenshots:
This is the DNSKEY ly
response incoming from traefik
: It is truncated
Hence, we cannot do DNSSEC validation and return NOERROR
but tell dig
that the reply is truncated.
Now dig
indeed retries over TCP. Pi-hole forwards again to traefik
and receives a reply:
However Even over TCP, where this is not even possible, traefik
tells us this is a truncated reply. Setting the TR (truncated) bit in TCP replies is meaningless. All it says according to the standard is "retry over TCP so there is no limit"
Whether this comes from traefik
or actually cloudflared
remains unclear.
At this point, Pi-hole is correctly labeling the reply as BOGUS + SERVFAIL as the upstream is not behaving as it should. This makes me very certain that we are not actually looking at a Pi-hole bug here.
What I'd need next is a way to reliably record a tcpdump
of all the containers. Is there a simple way, maybe even integrated into docker
for doing this ? While it was straightforward to install tcpdump
in the Pi-hole containers, I don't know how to install tcpdump
in the other two.
What I'd need next is a way to reliably record a tcpdump of all the containers. Is there a simple way, maybe even integrated into docker for doing this?
Interesting question. I found this article having an interesting solution. If you are on x64 you can skip the building of the (local) docker container and instead just run the command:
sudo docker run --tty --net=container:CONTAINER_NAME havmand/tcpdump
with CONTAINER_NAME
being replaced by the container-name you are interested in (easily found by running sudo docker-compose ps
)
Thanks, for reference, here are the files: pihole-traefik-udp-bug_pcaps.zip
This is a cloudflared
bug:
Just for completeness, the request itself did not have the bit set as can be seen here:
The relevant Internet standard is RFC 2181: Clarifications to the DNS Specification, more specifically the last sentence of section 9:
9. The TC (truncated) header bit The TC bit should be set in responses only when an RRSet is required as a part of the response, but could not be included in its entirety. The TC bit should not be set merely because some extra information could have been included, but there was insufficient room. This includes the results of additional section processing. In such cases the entire RRSet that will not fit in the response should be omitted, and the reply sent as is, with the TC bit clear. If the recipient of the reply needs the omitted data, it can construct a query for that data and send that separately. Where TC is set, the partial RRSet that would not completely fit may be left in the response. When a DNS client receives a reply with TC set, it should ignore that response, and query again, using a mechanism, such as a TCP connection, that will permit larger replies.
This is further clarified by RFC 5966: DNS Transport over TCP - Implementation Requirements, section 3:
In the absence of EDNS0 (Extension Mechanisms for DNS 0) (see below), the normal behaviour of any DNS server needing to send a UDP response that would exceed the 512-byte limit is for the server to truncate the response so that it fits within that limit and then set the TC flag in the response header. When the client receives such a response, it takes the TC flag as an indication that it should retry over TCP instead.
cloudflared
maintainers and ask how a TCP query can have the truncated bit set and how they expect the client to react.It can be reproduced easily using
dig @127.0.0.1 -p 5859 +tcp DNSKEY ly +dnssec
which shows flags: qr tc rd ra ad
but tc
should not be there.
I prepared a tcpdump
with reduced noise for their support:
pihole-traefik-udp-bug_cloudflared_1_2.zip
With all this, it should be straightforward for them to fix this.
However Even over TCP, where this is not even possible, traefik tells us this is a truncated reply. Setting the TR (truncated) bit in TCP replies is meaningless.
I wonder if this because at some point the dns.flags are copied from the original attempt and the truncated bit is not reset.
(interesting to note +notcp is set but dig is still "retrying in TCP mode")
I think this is because of +noignore
which controls the retry behavior
+[no]ignore This option ignores [or does not ignore] truncation in UDP responses instead of retrying with TCP. By default, TCP retries are performed.
Okay so from what i understand is that there are two issues here:
1) cloudflared inappropriately sets the TC flag for tcp response 2) cloudflared incorrectly validates the DNSSEC for bit.ly as BOGUS which should be INSECURE because domain validation is possible?
Two follow-up questions from that: 3) if the TC flags for TCP responses is meaningless - why would this cause pihole to refuse to answer correctly? shouldn't dnsmasq just ignore the tc flag then? 4) also why can cloudlfared itself resolve the query correctly but pihole can not?
pi@rpi:~ $ dig @127.0.0.1 -p 5859 +tcp +dnssec +short bit.ly
67.199.248.10
67.199.248.11
pi@rpi:~ $ dig @127.0.0.1 -p 5858 +tcp +dnssec +short bit.ly
pi@rpi:~ $
I suggest you contact cloudflared maintainers and ask how a TCP query can have the truncated bit set and how they expect the client to react.
I would love to help out here but i feel you are way more knowledgable in the DNS protocol and the client stack and i think it would be more efficient communciation if you could file this bug with them (the affected repository)? I would just be a proxy of communication which i think is not desirable for both sides.
However Even over TCP, where this is not even possible, traefik tells us this is a truncated reply. Setting the TR (truncated) bit in TCP replies is meaningless.
I wonder if this because at some point the dns.flags are copied from the original attempt and the truncated bit is not reset.
We can get the fail right away on the first query (see the last part of my post before). There is no original attempt. We performed everything over TCP in the simple test case.
if the TC flags for TCP responses is meaningless - why would this cause pihole to refuse to answer correctly? shouldn't dnsmasq just ignore the tc flag then?
because RFC 2181 explicitly says (see bold text above):
When a DNS client receives a reply with TC set, it should ignore that response [...]
It doesn't say we should do this only for UDP queries. As consequence, the TC bit set in TCP reply means a hard fail as there are no other means we could try here. DNSSEC validation in Pi-hole then fails because critical parts in the chain-of-trust failed.
also why can cloudlfared itself resolve the query correctly but pihole can not?
Because cloudflared
doesn't seem to be doing any DNSSEC validation. It will give you whatever you ask for without any DNSSEC check.
Issue ticket submitted. Please subscribe to it @BreiteSeite in case they have questions you can answer, too. Like the version of cloudflared
or any other details of the docker
setup. I don't really have the time to debug cloudflared
, this investigation here already took a lot of time.
Thanks for filing that bug upstream and your patience troubleshooting and explaining this to me. The issue got a lot clearer to me with your last post.
I just subscribed and will help out there as best as i can.
One question i still have though:
pi@rpi:~ $ dig @127.0.0.1 -p 5859 +tcp DNSKEY +dnssec ly
; <<>> DiG 9.16.22-Debian <<>> @127.0.0.1 -p 5859 +tcp DNSKEY +dnssec ly
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 31066
;; flags: qr tc rd ra ad; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags: do; udp: 4096
; COOKIE: a73e701c7cc2c08cab3ce8c161cf1c8ceb3bc077b6761f8d (good)
;; QUESTION SECTION:
;ly. IN DNSKEY
;; Query time: 351 msec
;; SERVER: 127.0.0.1#5859(127.0.0.1)
;; WHEN: Fri Dec 31 16:06:52 CET 2021
;; MSG SIZE rcvd: 59
Why would dig report udp: 4096 in the OPT PSEUDOSECTION?
We're always learning together.
Why would dig report udp: 4096 in the OPT PSEUDOSECTION?
This is fine. It just tells your resolver that up to 4096 bytes payloads are transmittable over UDP. It's always there. More info, e.g. here (to not always cite Internet standards alone).
One other things that just came to my mind, I've seen you selected --upstream https://dns11.quad9.net/dns-query
which is not cloudflare. Why did you chose this? And, even more interesting perhaps: Does Cloudflare's DoH server show the same abnormal behavior (TC bit + empty response) for the dig
above?
This is fine. It just tells your resolver that up to 4096 bytes payloads are transmittable over UDP. It's always there. More info, e.g. here (to not always cite Internet standards alone).
Thanks. I wasn't aware this was client-side information. I need to get more familiar with dig and DNS at some point i guess.
One other things that just came to my mind, I've seen you selected --upstream https://dns11.quad9.net/dns-query which is not cloudflare. Why did you chose this?
Well i choose cloudflared because it seemed like a small DoH client that does everything i would need it to do which also receives very frequent updates and had an arm64 docker image ready and does not require a lot of configuration.
quad9 i choose for privacy reasons
I was under the assumption that this is a valid combination as this is even mentioned in the pi-hole docs:
And, even more interesting perhaps: Does Cloudflare's DoH server show the same abnormal behavior (TC bit + empty response) for the dig above?
Actually very interesting because it doesn't.
pi@rpi:~/pihole-test $ sudo docker-compose logs cloudflared
pihole-test-cloudflared-1 | 2021-12-31T16:38:33Z INF Adding DNS upstream url=https://1.1.1.1/dns-query
pihole-test-cloudflared-1 | 2021-12-31T16:38:33Z INF Starting DNS over HTTPS proxy server address=dns://0.0.0.0:5811
pihole-test-cloudflared-1 | 2021-12-31T16:38:33Z INF Starting metrics server on 127.0.0.1:37025/metrics
pi@rpi:~/pihole-test $ dig @127.0.0.1 -p 5859 +tcp DNSKEY +dnssec ly
; <<>> DiG 9.16.22-Debian <<>> @127.0.0.1 -p 5859 +tcp DNSKEY +dnssec ly
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 12381
;; flags: qr aa rd ra ad; QUERY: 1, ANSWER: 3, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags: do; udp: 4096
; COOKIE: 1ecc3775cfdafd2c (echoed)
;; QUESTION SECTION:
;ly. IN DNSKEY
;; ANSWER SECTION:
ly. 3575 IN DNSKEY 256 3 8 AwEAAYdSpuTFbv0JmMYpI1cWcR/jVIOmPvo1eJnYS+VUStiGfTXvz26R UtU0LPEECV+06X1OXYiLUbt2x2XgqKJQTFvJd6Jo6Yhwr+VCuEPadNe4 4Omhs1Sp8btMWnR57o8VDkV1c+q82QPeD0krwnU6UnYdcztDAUfk75Dq +QuP4fAp0Fvi7ggivxI/nIhxON2GheHBMbU7VysnSVtx7RGt1hTgmtny fwEtZWsbVJUCfXjRANglgtIogul4hmgHwGGXfPlK2u7l/y661rHlzg4B UyT1+iVSKv18ecpJ0RpVnNpPypvGlP/oDaumgoMTB3MZL2BmDIUMh2aE 6VSnd7WcyDU=
ly. 3575 IN DNSKEY 257 3 8 AwEAAcbhsGvQrDj3AA1GU31NNkDhJe+ghg7C+sHdX/gnBpFWMohDkZIX LVPTZ7i+6uHktDguiIImS+X1F8l6vBguMUeLn82zwtiY05SPDGezNavk Dxp9QkdANVpOhvLruoCOAtLHKNhUwPvdk21ZkTauZYctCBNRXs5r7/8m mbg0O9O1qouP3KovP6Oo7N2VhpzfaqR0zRRYpvTt0oqabTplbVYihU7C xxgXMY5WLSftHV7faedrbPRjgesFELLykxQWC8TPZ9XnqzD9mUkpDjjz 2bfgQKaVdOtu6Q0MH7OF0J3g58NL6tATmj3+gN0vf1nbR515CVsapOOE Vt7Rl0rrPkvb2pSZXfR65BGDVJMA1rRPCW2ZL1x2sU+trVFv1eRonA67 LYipR2I9+wy71OWuiTpxjG74EvlDRMYVMPmJo4AoDo1v5ZpLN9yDH5Fy LV2Cg+IXNXO9jOstxWIjK3xkhbDCr8LcKU7p9JL44WhM6D+dSBqesyGG W97s2pUUxIEL0FnwO5R56vhvRmDy5iVIw/iYx7+RpeJypXjIIBS2lniZ 3SNdFDeuSQOSn+owOvccbFNJcDBBFpfBBerukV+LgN3Z/Q2zFWG9SNwV oQapYM80MB/85RMgN0CO/wvqUitdlnsCOnfYgGw+4tS3LWh5kw4SIMUB 0YdlhJ3BUNhFf+F1
ly. 3575 IN RRSIG DNSKEY 8 1 172800 20220115180000 20211229180000 62311 ly. fJlYyZ6kkzOpgI6+zJhjl9SOhTAECZnOpcV1wXmAVhMQHTiyCIDb9vLd YHl+tZMsWoDH0Bya+DhKVvxPVqSfc6wo2w5dFgsSmDOfI+qZrHb10o+9 7t7VaR37VJsL3f9hhueYhEZH4DgRXqxAABp/FG6mx+VSjrIJGUvI8wCv hzR57OIjujQqsiej4zIjYDdwrgCwrRz4Ued85d038d9E+lkvBKK3hYcF 4ILZOcfJ6jXKHDttlk+6NMU18uCnnnRuUz9rbgr1YRgAP/Z+Kvi29qSh Y+Xvml+wOPzJDzu6H2uxD9+tcIFoqrl92lqfHoIh90wq9a6i7IUxFzKa oHr5pSVIX5S50/MFQo4UxXX2IoJzcOCBJRTzyWWOmxlRxk4P7H8BxD8Y 65fbqBNzrFCNuliLxh68uXq/S508lBEuQyEs4dFTC0x2dyBmOV7cAYeZ RlD645MOjxwyiGhYnCgTgZZAoIPor1qUyOg4osJDc9357wN5ehSHHSI9 qoFLxqFmmflitNxIKCkPtUemAnnCTN7NtJjA551/HMAWa2mX0RCoM7RD cJohzJL5RpZKoWDqvEzj1SdzbDomhZZFFyUvfoNXLJjWvrJ/tzAc6QDH A6oNzsLwW29Ft8w02ev4lHriaZXPW5dZswbIC8u8Tm2oDJByY6D28PAn IUo6bqiYOI0=
;; Query time: 3 msec
;; SERVER: 127.0.0.1#5859(127.0.0.1)
;; WHEN: Fri Dec 31 17:39:10 CET 2021
;; MSG SIZE rcvd: 1403
I will add this to the linked issue.
So it's likely even a Quad9 bug, not cloudflared
which may just stupidly pass along whatever it gets served from upstream.
I agree. As you flagged this upstream (thank you) - i will close this issue.
I use cloudflares DNS upstream as a workaround.
This issue has been mentioned on Pi-hole Userspace. There might be relevant details there:
https://discourse.pi-hole.net/t/debian-org-does-not-seem-to-resolve/55158/2
Versions
Platform
Expected behavior
Can resolve careers.intuitive.com.
Actual behavior / bug
Browser loads forever. dig reports error:
Steps to reproduce
Steps to reproduce the behavior:
dig careers.intuitive.com
against your piholeDebug Token
Additional context
I run pihole on a single-node docker swarm cluster. Bug happens when scaling to 1 replica as well.
Upstream DNS is a cloudflared container in version 2021.12.3. DNSSEC on pi-hole is enabled.
Resolving the record directly via upstream (cloudflared) works fine
Resolving other queries via pihole works fine
Webinterface shows N/A Reply
Long-Term query data for affected domain
Pi-Hole Remote output
The domain is not on any blacklist or blocklist
tcpdump
Attached a tcpdump generated via
sudo tcpdump 'port 53 and (dst host localhost) or (src host 172.18.0.1)' -i docker_gwbridge -A -w bug.pcap
~bug.pcap.zip~ (removed)