pi-hole / FTL

The Pi-hole FTL engine
https://pi-hole.net
Other
1.36k stars 194 forks source link

DNS RRset round-robin #737

Closed corbolais closed 4 years ago

corbolais commented 4 years ago

In raising this issue, I confirm the following: {please fill the checkboxes, e.g: [X]}


We use DNS rrset round-robin for redundancy. Using dig @ourdns.com cdn returns the IPs cycling, hence round-robin is in effect. Using dig @8.8.8.8 cdn also yields rotating results.

Expected behaviour: Regardless of the DNS server/cache consulted, round-robin should be in effect, returning cycling IP addresses for a RRset.

Actual behaviour: Using the pihole FTL resolver, the order of IPs returned is fixed (it appears, they are even ordered). Using unbound as a recursive resolver with rrset-roundrobin: yes and pihole FTL it's the same ordered result.

There's one hint about round-robin to be found regardig pihole FTL, which links to a doc page not actually mentioning round-robin (any more?).

What's the acutal take on this?

As it seems right now, the FTL behaviour defies the purpose of RRsets, round-robin and thus redundancy.

DL6ER commented 4 years ago

Using dig @8.8.8.8 cdn also yields rotating results.

I doesn't appear to do this:

$ dig @8.8.8.8 cdn

; <<>> DiG 9.11.5-P4-5.1ubuntu2.1-Ubuntu <<>> @8.8.8.8 cdn
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 63447
;; flags: qr rd ra ad; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;cdn.               IN  A

;; AUTHORITY SECTION:
.           86398   IN  SOA a.root-servers.net. nstld.verisign-grs.com. 2020042300 1800 900 604800 86400

;; Query time: 96 msec
;; SERVER: 8.8.8.8#53(8.8.8.8)
;; WHEN: Do Apr 23 11:49:33 CEST 2020
;; MSG SIZE  rcvd: 107

I'm not aware of IP sorting of any kind pihole-FTL is doing, however, we can surely look further into this when you can provide some publicly available source for the requested information (see my dig result from above)

corbolais commented 4 years ago

Hi, Thanks for commenting.

I'm not sure what you intend to prove looking up cdn. I cannot derive anything from your example regarding round-robin or rrsets.

Sure enough, my example of dig cdn was purely exemplary. I shall not provide customer data, but just look at how the IPs cycle: dig yui.yahooapis.com @8.8.8.8

dig yui.yahooapis.com @8.8.8.8 +short
edge.gycpi.b.yahoodns.net. 16   IN      A       87.248.118.23
edge.gycpi.b.yahoodns.net. 16   IN      A       87.248.118.22

wait for it.. or query a couple of times..

dig yui.yahooapis.com @8.8.8.8 +short
edge.gycpi.b.yahoodns.net. 16   IN      A       87.248.118.22
edge.gycpi.b.yahoodns.net. 16   IN      A       87.248.118.23

and compare that to the IP order when querying pihole FTL:

dig yui.yahooapis.com +short
edge.gycpi.b.yahoodns.net. 16   IN      A       87.248.118.22
edge.gycpi.b.yahoodns.net. 16   IN      A       87.248.118.23

wait for it.. or query a couple of times..

dig yui.yahooapis.com +short
edge.gycpi.b.yahoodns.net. 16   IN      A       87.248.118.22
edge.gycpi.b.yahoodns.net. 16   IN      A       87.248.118.23

Funnily enough, with pyhole FTL I sometimes seem to get ..22 ..23 and at other times ..23 ..22. So it seems not to be any strict ordering but guessing "cycling with lesser frequency"? Pyhole FTL keeps the order stable for a very long period of time before eventually returning any other order?

Thank you.

DL6ER commented 4 years ago

Funnily enough, with Pi-hole FTL I sometimes seem to get ..22 ..23 and at other times ..23 ..22. So it seems not to be any strict ordering but guessing "cycling with lesser frequency"? Pi-hole FTL keeps the order stable for a very long period of time before eventually returning any other order? [Pi-hole typos fixed]

Yes. I see now what you're aiming at exactly. Something like, let me call it a "reply order shuffling" is not implemented. When you query the record for the first time, you'll receive a certain order from your upstream DNS server (may it be Google, recursively walking along the root DNS servers, or whatever). This will be kept for the specified TTL (time-to-life). During the TTL, the same answer is provided from cache. After the TTL expired, we request new records from the upstream DNS server. They may have a different order this time. This exactly resembles your observation.

We adopted the behavior from the popular dnsmasq used widely, e.g., in Debian systems. Is this really a change we want to do?

corbolais commented 4 years ago

Well, I was expecting a round-robin as bind9 is implementing: ask, get one, ask again, get two, ask again, get one.. With a round-robin of 2 RRs (Resource Records) in this example, that is. This (bind9-)behaviour is TTL independent.

DL6ER commented 4 years ago

The feature would need to get requested for dnsmasq, we will absorb it afterwards. Can you submit a request to dnsmasq-discuss@lists.thekelleys.org.uk (http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss)?

What is your take on this @dschaper ?

dschaper commented 4 years ago

Showing me dig output with just two records doesn't give me a good view of the issue. I don't know what kind of randomness there is, you may just be flipping heads every time while still having a random chance of outcomes.

A deterministic response that takes the previous responses in to weight means we have to keep a record of the previous responses. So every query now has to remain in memory as we don't know which records may be queried again in the future and for how long between queries there may be.

dschaper commented 4 years ago
dschaper@nanopi-r2s:/etc/pihole$ dig NS pi-hole.net +short @127.0.0.1
ns2.pi-hole.net.
ns4.pi-hole.net.
ns1.pi-hole.net.
ns3.pi-hole.net.
dschaper@nanopi-r2s:/etc/pihole$ dig NS pi-hole.net +short @127.0.0.1
ns3.pi-hole.net.
ns4.pi-hole.net.
ns1.pi-hole.net.
ns2.pi-hole.net.
dschaper@nanopi-r2s:/etc/pihole$ dig NS pi-hole.net +short @127.0.0.1
ns2.pi-hole.net.
ns4.pi-hole.net.
ns3.pi-hole.net.
ns1.pi-hole.net.
dschaper@nanopi-r2s:/etc/pihole$ dig NS pi-hole.net +short @127.0.0.1
ns3.pi-hole.net.
ns4.pi-hole.net.
ns1.pi-hole.net.
ns2.pi-hole.net.
dschaper@nanopi-r2s:/etc/pihole$ dig NS pi-hole.net +short @127.0.0.1
ns3.pi-hole.net.
ns4.pi-hole.net.
ns1.pi-hole.net.
ns2.pi-hole.net.
corbolais commented 4 years ago

Showing me dig output with just two records

This was only exemplary, with two RRs. It's (in my case) three RRs.

Using NS pi-hole.net I sometimes get the results you posted, querying for NS of another set of domains I rarely get rotating -- or changing, for that matter -- results.

cheers

dschaper commented 4 years ago

If you give me the same domain you are using I can see if I can duplicate.

Your exemplary digs with 2 records doesn't show me much of anything, back to the coin analogy it's showing me you flipped a coin 5 times and it came up heads 4 times so you conclude the coin is unbalanced.

But none of this matters if your request isn't for "random" order.

DL6ER commented 4 years ago

I just tested this again (in my case, I tested it on a user's request for netflix.com) and saw:

8 Answer Records                                              
    netflix.com          IN    A     52.213.155.117 {TTL = 54}
    netflix.com          IN    A     52.210.221.246 {TTL = 54}
    netflix.com          IN    A     34.251.239.113 {TTL = 54}
    netflix.com          IN    A     52.19.113.209 {TTL = 54}
    netflix.com          IN    A     52.214.145.233 {TTL = 54}
    netflix.com          IN    A     54.194.87.208 {TTL = 54}
    netflix.com          IN    A     52.213.214.244 {TTL = 54}
    netflix.com          IN    A     52.49.81.143 {TTL = 54}
8 Answer Records                                              
    netflix.com          IN    A     52.49.81.143 {TTL = 53}  
    netflix.com          IN    A     52.213.155.117 {TTL = 53}
    netflix.com          IN    A     52.210.221.246 {TTL = 53}
    netflix.com          IN    A     34.251.239.113 {TTL = 53}
    netflix.com          IN    A     52.19.113.209 {TTL = 53}
    netflix.com          IN    A     52.214.145.233 {TTL = 53}
    netflix.com          IN    A     54.194.87.208 {TTL = 53}
    netflix.com          IN    A     52.213.214.244 {TTL = 53}
8 Answer Records
    netflix.com          IN    A     52.213.214.244 {TTL = 52}
    netflix.com          IN    A     52.49.81.143 {TTL = 52}
    netflix.com          IN    A     52.213.155.117 {TTL = 52}
    netflix.com          IN    A     52.210.221.246 {TTL = 52}
    netflix.com          IN    A     34.251.239.113 {TTL = 52}
    netflix.com          IN    A     52.19.113.209 {TTL = 52}
    netflix.com          IN    A     52.214.145.233 {TTL = 52}
    netflix.com          IN    A     54.194.87.208 {TTL = 52}
8 Answer Records
    netflix.com          IN    A     54.194.87.208 {TTL = 51}
    netflix.com          IN    A     52.213.214.244 {TTL = 51}
    netflix.com          IN    A     52.49.81.143 {TTL = 51}
    netflix.com          IN    A     52.213.155.117 {TTL = 51}
    netflix.com          IN    A     52.210.221.246 {TTL = 51}
    netflix.com          IN    A     34.251.239.113 {TTL = 51}
    netflix.com          IN    A     52.19.113.209 {TTL = 51}
    netflix.com          IN    A     52.214.145.233 {TTL = 51}

This indeed looks like being very random, despite the fact that this has been cached in Pi-hole. I will close this PR as I think there is in fact no issue. If you disagree, feel free to re-open, however, it would be great if you can give some more precise examples on which we can test this. If this does only appear on some internally used domains but nowhere in the public Internet domain, the problem may lie elsewhere.