samhh / dotfiles

Configuring the universe with Nix.
49 stars 2 forks source link

Don't inherit DNS from DHCP on Tentacool #364

Closed samhh closed 1 year ago

samhh commented 1 year ago

Tentacool hosts Onix, my LAN DNS server. When updating its container it goes down, and it then fails to fetch an image. I can workaround this by supplying a second DNS server like 8.8.8.8 via DHCP, but this can let ads through on other devices. Ideally Tentacool would have hardcoded DNS not from DHCP.

samhh commented 1 year ago

For checking DNS incl/ if set by DHCP:

$ resolvconf -l
samhh commented 1 year ago

The above output updates immediately, but no luck with https://github.com/samhh/dotfiles/commit/c0878e6d4956645cc60a6b6b4d29a0e323b4cc0e or https://github.com/samhh/dotfiles/commit/199597eef0f0b5602bb30a417dd70dd8b95a161b for https://github.com/samhh/dotfiles/commit/3cbfa1d6e10603f99e88be30797bd81f52b9b22d.

Nov 07 23:31:00 tentacool systemd[1]: Starting podman-pihole.service...
Nov 07 23:31:00 tentacool podman-pihole-start[3386227]: Resolving "pihole/pihole" using unqualified-search registries (/etc/containers/registries.conf)
Nov 07 23:31:00 tentacool podman-pihole-start[3386227]: Trying to pull docker.io/pihole/pihole:2022.10...
Nov 07 23:31:00 tentacool podman-pihole-start[3386227]: Trying to pull quay.io/pihole/pihole:2022.10...
Nov 07 23:31:00 tentacool podman-pihole-start[3386227]: Error: 2 errors occurred while pulling:
Nov 07 23:31:00 tentacool podman-pihole-start[3386227]:  * initializing source docker://pihole/pihole:2022.10: pinging container registry registry-1.docker.io: Get "https://registry-1.docker.io/v2/": dial tcp: lookup registry-1.docker.io: no such host
Nov 07 23:31:00 tentacool podman-pihole-start[3386227]:  * initializing source docker://quay.io/pihole/pihole:2022.10: pinging container registry quay.io: Get "https://quay.io/v2/": dial tcp: lookup quay.io: no such host
Nov 07 23:31:00 tentacool systemd[1]: podman-pihole.service: Main process exited, code=exited, status=125/n/a
Nov 07 23:31:00 tentacool podman-pihole-post-stop[3386242]: Error: reading CIDFile: open /run/podman-pihole.ctr-id: no such file or directory
Nov 07 23:31:00 tentacool systemd[1]: podman-pihole.service: Control process exited, code=exited, status=125/n/a
Nov 07 23:31:00 tentacool systemd[1]: podman-pihole.service: Failed with result 'exit-code'.
Nov 07 23:31:00 tentacool systemd[1]: Failed to start podman-pihole.service.
Nov 07 23:31:00 tentacool systemd[1]: podman-pihole.service: Scheduled restart job, restart counter is at 5.
Nov 07 23:31:00 tentacool systemd[1]: Stopped podman-pihole.service.

Maybe let's try reloading resolvconf first somehow, or rebooting? Though I'd expect Nix to take care of that stuff.

samhh commented 1 year ago
$ systemctl status resolvconf

Suggests nothing has happened with the service for four days. Perhaps it indeed needs reloading.

samhh commented 1 year ago

No luck with a unit restart after updating the nameservers but before upgrading Pi-hole, regardless of the order of the nameservers.

samhh commented 1 year ago

Simpler repro:

$ resolvconf -l
nameserver 127.0.0.1
nameserver 8.8.8.8

<dhcp nameservers>

$ nix-shell -p dogdns

$ dog samhh.com
A samhh.com. <time> <ip>

# systemctl stop podman-pihole

$ dog samhh.com
Error [network]: Connection refused (os error 111)

# systemctl start podman-pihole

$ dog samhh.com
A samhh.com. <time> <ip>
samhh commented 1 year ago

https://serverfault.com/a/513273:

The resolver will query the second name server only if the attempt to reach the first name server times out. In your case, it is not a time out issue, it is a resolution failure, so there is no need to query the remaining name servers.

I'm actually not sure now how I ever got it working before via DHCP. Presumably on startup something is checking for a working nameserver among those listed and with a Tentacool restart that's how it got beyond Onix.

samhh commented 1 year ago

Perhaps using Unbound could workaround this. It wouldn't have the same issue as the new version would be fetched from nixpkgs before it's restarted. (Relevant issue for a nixpkgs derivation of Pi-hole: https://github.com/NixOS/nixpkgs/issues/61617)

Then again, it's another moving part which I don't really need as Onix maintains its own cache:

$ dog samhh.com --time -n tentacool
A samhh.com. <etc>
Ran in 1ms
$ dog samhh.com --time -n 8.8.8.8
A samhh.com. <etc>
Ran in 7ms

Blocky also has a nixpkgs derivation. Really come to think of it the best solution is anything that gets this out of its container.

samhh commented 1 year ago

Downside of leaving Pi-hole... figuring out how to plug anything new into UniFi.

samhh commented 1 year ago

AdGuard Home is another option.

samhh commented 1 year ago

More broadly, it's really irritating that NixOS brings down containers before it fetches new images. It causes needless downtime on upgrades of services like Starmie. In other words, I want zero-downtime deployments, which'd solve this issue by proxy. Then again I guess that's what you get if you just don't use containers...

samhh commented 1 year ago

Don't particularly want to lose niceties like Pi-hole integration on my phone.

samhh commented 1 year ago

Back to the resolver. Putting an external nameserver first and restarting the resolvconf unit makes no difference, seemingly contradicting the idea that it runs them in order. Something else stateful at play?

samhh commented 1 year ago

Unbound will only help if the relevant container entries have been cached before Onix goes down. Or if it will do failover unlike resolvconf.

samhh commented 1 year ago

I suppose a hacky workaround would be to use /etc/hosts to bypass resolvconf for docker.io or wherever else containers might come from.

samhh commented 1 year ago

dmsmasq lets you specify nameservers for specific hosts which'd be a little less brittle, but that's bundled into the Pi-hole image.

samhh commented 1 year ago

Another test of resolvconf changes:

$ # After a restart
$ dog samhh.com --time
<etc>
Ran in 20ms
$ dog samhh.com --time
<etc>
Ran in 0ms
# # rebuild with 8.8.8.8 placed at the top, validated with `resolvconf -l`
$ dog samhh.com --time
<etc>
Ran in 0ms
# systemctl restart resolvconf
$ dog samhh.com --time
<etc>
Ran in 0ms

It caches in Onix and then clearly keeps using Onix regardless of what else changes.

samhh commented 1 year ago

No luck with /etc/hosts:

https://github.com/samhh/dotfiles/commit/44e17042133a3239d6f93020e15422923d4aa16f

$ cat /etc/hosts
127.0.0.1 localhost
::1 localhost
127.0.0.2 tentacool
::1 tentacool
3.228.146.75 docker.io

Exact same service error. Maybe Podman doesn't pick up on host network changes properly?

samhh commented 1 year ago

No luck with:

commit 26218d34028095c6cfd9fc5961d807874ce14f3b
Author: Sam A. Horvath-Hunt <hello@samhh.com>
Date:   Tue Nov 8 19:28:20 2022 +0000

    Live resolvconf updates

diff --git a/hosts/tentacool/network.nix b/hosts/tentacool/network.nix
index d2bdff0..11828c3 100644
--- a/hosts/tentacool/network.nix
+++ b/hosts/tentacool/network.nix
@@ -12,4 +12,8 @@
     # nameservers in order to configure this external nameserver.
     nameservers = [ "8.8.8.8" "127.0.0.1" ];
   };
+
+  # This needs to be set so that a full system restart isn't needed:
+  #   https://unix.stackexchange.com/a/487615
+  system.nssDatabases.hosts = [ "resolve" ];
 }
samhh commented 1 year ago

Another idea as a workaround for now - pull the image before rebuilding whenever there's an Onix upgrade:

$ podman pull pihole/pihole:2022.10

Something's different about images which have already been successfully brought up as containers though:

$ podman pull pihole/pihole:2022.10
Trying to pull pihole/pihole:2022.10... etc
$ podman pull docker.io/pihole/pihole:2022.09.4
Trying to pull docker.io/pihole/pihole:2022.09.4... etc
$ podman pull pihole/pihole:2022.09.4
Resolved "pihole/pihole" as an alias (/home/sam/.cache/containers/short-name-aliases.conf)
Trying to pull pihole/pihole:2022.09.4... etc

I don't understand why this only applies to 2022.09.4 given the contents of the referenced file:

$ cat ~/.cache/containers/short-name-aliases.conf
[aliases]
  "pihole/pihole" = "docker.io/pihole/pihole"
samhh commented 1 year ago

wut

$ resolvconf -l
nameserver 127.0.0.1
nameserver 8.8.8.8
<etc>
$ cat /etc/resolv.conf
nameserver 127.0.0.1
options edns0
samhh commented 1 year ago

No luck with:

commit e8e0124be0aeb3c4fc9595bd2c9d96e48788141a
Author: Sam A. Horvath-Hunt <hello@samhh.com>
Date:   Tue Nov 8 20:24:07 2022 +0000

    Rotate Tentacool nameservers

diff --git a/hosts/tentacool/network.nix b/hosts/tentacool/network.nix
index ee636db..7f2bc32 100644
--- a/hosts/tentacool/network.nix
+++ b/hosts/tentacool/network.nix
@@ -11,5 +11,10 @@
     # points only at Onix and this machine specifically will override DHCP's
     # nameservers in order to configure this additional, external nameserver.
     nameservers = [ "127.0.0.1" "8.8.8.8" ];
+
+    # resolvconf doesn't failover to other nameservers if the first it tries
+    # completely fails. This makes it rotate between all available options,
+    # which changes that unwanted behaviour as a side effect.
+    resolvconf.extraOptions = [ "rotate" ];
   };
 }
samhh commented 1 year ago

Temporary workaround:

  1. networking.nameservers = [ "8.8.8.8" ];
  2. Rebuild
  3. Upgrade Onix
  4. Rebuild
  5. Reset/remove networking.nameservers
  6. Rebuild
samhh commented 1 year ago

I can workaround this by supplying a second DNS server like 8.8.8.8 via DHCP, but this can let ads through on other devices.

Let's validate this. They could always use their own DNS anyway.

Edit: Yeah, this seems to let ads into the network. Block % went down too.

Edit 2: Makes sense, see also: https://www.reddit.com/r/pihole/comments/12dzrfq/small_question_while_updating_raspberrypis_os/jfaq80u/?context=3

samhh commented 1 year ago

Trying UniFi's built-in adblocker as one fewer moving part, particularly for DNS, would be nice. If it's trash then probably Blocky as it's declarative and would solve this issue (+ #386).

samhh commented 1 year ago

UniFi's adblocker was noticeably less effective. Went with Blocky: https://github.com/samhh/dotfiles/commit/e22e0e3fa1e7a773d6d90e6c63992ecbe86802d1