home-assistant / plugin-dns

CoreDNS implementation for Home Assistant
Apache License 2.0
19 stars 14 forks source link

Cloudflare as fallback only and no healthcheck #82

Closed mdegat01 closed 2 years ago

mdegat01 commented 2 years ago

Cloudflare's DNS server in the corefile config was intended only to be a fallback, one to be used when the provided ones failed. However it is currently listed twice in this config: as a fallback and as the last forwarding DNS server. This creates two issues:

  1. We healthcheck cloudflare's DNS server every minute. This is excessive and unnecessary. Cloudflare's DNS server isn't likely to go down, what's more likely is the user has it blocked which then leads to runaway healthchecking (#64). Plus we have no other fallback anyway so whether its down or not won't change our DNS decisioning here.
  2. Coredns seems to get "stuck" on the fallback. I don't have a great explanation for this but I do have a lot of issues to point to (#51, #54, #20).Perhaps its because it starts skipping servers after fails > maxfails and Cloudflare is always the last one standing from this? Way less likely to fail then a local DNS server.

We made the fallback plugin so we could use cloudflare as a fallback if the others fail to minimize the amount of DNS issues faced by users that pay no attention to DNS and expect it to just work. We should limit our usage of it to just that so the local DNS servers are always preferred.

mdegat01 commented 2 years ago

For the record, I am linking the .local issues to show evidence of the forward plugin getting stuck on cloudflare. In reality the recent changes to the mdns plugin (#73) should've resolved those, it seems to have helped users currently on the 2022.3 (https://github.com/home-assistant/plugin-dns/issues/54#issuecomment-1066164588 and https://github.com/home-assistant/plugin-dns/issues/54#issuecomment-1066208858)

mdegat01 commented 2 years ago

Reverting the removal of policy: sequential after talking to @pvizeli . We should still go sequential and prefer the user provided DNS overrides over the local ones provided by DHCP. Also removing healthcheck doesn't disable it, it defaults to 0.5s 😱 Setting max_fails to 0 to disable.