Potential problem with no-daemon option

DL6ER commented 5 years ago

Internal discussion - I'm aware of a violation of the issue template, however, this ticket mostly serves to ensure reproducibility rather than directly ask anyone to fix it (I will most likely fix it myself during the discussion).

This discussion starts with https://github.com/pi-hole/docker-pi-hole/pull/370#pullrequestreview-183866786 where @diginc mentioned that he found it necessary to run pihole-FTL debug in order to keep pihole-FTL in foreground.

As result of this mentioning, @Mcat12 proposed a solution for FTL and mentioned that he found it necessary to run the dnsmasq code in debug mode. This should be avoided for daily use cases as it has several drawbacks, most relevant for Netflix users as we already know from previous investigations that TCP requests are important for these clients. However, in debug mode, dnsmasq does not fork at all for new TCP workers and deadlocks (or at least considerable delays of up to one minute or so) can easily arise.

I tried to reproduce the issue locally on the latest build in the master branch, i.e., the code we released as v4.1, but I wasn't able to reproduce it:

Terminal 1:

$ ./pihole-FTL -v
v4.1

$ ./pihole-FTL -b
master

$ ./pihole-FTL no-daemon

(pihole-FTL stays in foreground and blocks the terminal from here on)

Terminal 2:

$ dig google.de

; <<>> DiG 9.10.3-P4-Raspbian <<>> google.de
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 35155
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1472
;; QUESTION SECTION:
;google.de.                     IN      A

;; ANSWER SECTION:
google.de.              2669    IN      A       172.217.21.3

;; Query time: 1 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Wed Dec 12 17:55:51 CET 2018
;; MSG SIZE  rcvd: 54

$ dig version.bind txt chaos +short
"dnsmasq-pi-hole-2.79"

DL6ER commented 5 years ago

This user reports to be affected by an FTL bug we fixed over half a year ago which is caused by using debug mode in the docker containers. Debug mode is (solely) built to ease debugging and, as such, disables forking of the DNS resolver. This is, however, necessary for TCP sessions, initiated, e.g., by Netflix apps on devices like Playstation as these apps do never close the open session, i.e., conquer the DNS server until the timeout is eventually hit.

This issues is not present when running in normal (read this as anything but not debug mode) as, here, we allow the process to fork in order to spawn independent children who can deal with such unterminated TCP sessions as mandated by the corresponding RFCs.

doodlemania2 commented 5 years ago

Chiming in so I can follow here from pi-hole discourse in RE debug mode causing the log to stream to stdout. Also, I can anecdotally note that Netflix and at least a few other apps completely crater the pi-hole causing all kinds of issues. I thought at first there might be some blocking on the log (writing that much data to the SD card can bring the whole raspberrypi to the ground, but after reading the above comment, may be simply DNS being overwhelmed to the point of timeouts.

One other observation - it appears in this version that some DNS packets are simply getting dropped or never replied to. I thought this was related to my Cloudflared but this might also be a simple coincidence due to the above issue. Anyways, just a few additional data points!

DL6ER commented 5 years ago

may be simply DNS being overwhelmed to the point of timeouts

No, it is not an overwhelming, but it is the problem that an app can really conquer the DNS server for itself. This occupation does automatically drop after a given timeout, but this is a fallback mechanism for the case that the client app died without releasing the connection.

it appears in this version that some DNS packets are simply getting dropped or never replied to

This will inevitably happen when an app occupies the DNS server over the TCP connection in debug mode as FTL cannot respond to new (and old!) requests that are outside of this single connection - regardless if through UDP or another TCP connection.

ericbisme commented 5 years ago

I will add that I was chasing down the exact same issue as in the Medium article last week. I am running in Kubernetes with pihole and cloudflared running together in a pod. Prior to seeing the problem I had migrated from the diginc/pi-hole docker image to pihole/pihole using latest (4.1 at the time).

I also noticed a correlation with api-global.netflix.com and eventually saw that FTL was running in debug mode which led me to this GitHub issue.

I reverted to tag 4.0.0-1_amd64 of pihole/pihole and have been stable since the switch. I confirmed that FTL is not running in debug mode in this version.

DL6ER commented 5 years ago

Will be fixed by https://github.com/pi-hole/docker-pi-hole/pull/380

diginc commented 5 years ago

Just to add the root cause I mentioned in the other ticket, FTL seemed to crash out when ran in non-debug mode due to not having the right linux permissions / capabilities.

Having docker pass setcap power of NET_ADMIN seemed to fix the containerized tests: --cap-add=NET_ADMIN, all the docker tests were bombing out simply by changing debug to no-daemon until I added NET_ADMIN.

I realized this because I went to run an strace inside the container to see why it would suddenly just stop in no-daemon mode and that required passing more privileges, I used all privs with docker run --privileged and as soon as I did that it stopped reproducing the problem.

DL6ER commented 5 years ago

I wasn't aware of that pihole-FTL crashes when permissions are only partially applied. I'll mark this as an FTL bug in this case and will provide a fix when I can reproduce the crash.

diginc commented 5 years ago

Here is a way to reproduce, first the working version:

$ docker run -it --rm --entrypoint='pihole-FTL' --cap-add=NET_ADMIN pihole/pihole:4.1.1 no-daemon
[2019-01-02 14:40:58.116] Using log file /var/log/pihole-FTL.log
[2019-01-02 14:40:58.116] ########## FTL started! ##########
[2019-01-02 14:40:58.117] FTL branch: master
[2019-01-02 14:40:58.117] FTL version: v4.1.2
[2019-01-02 14:40:58.117] FTL commit: b06eedf
[2019-01-02 14:40:58.117] FTL date: 2018-12-21 14:43:34 -0600
[2019-01-02 14:40:58.117] FTL user: root
[2019-01-02 14:40:58.117] WARNING: Starting pihole-FTL as user root is not recommended
[2019-01-02 14:40:58.117] Starting config file parsing (/etc/pihole/pihole-FTL.conf)
[2019-01-02 14:40:58.117]    SOCKET_LISTENING: only local
[2019-01-02 14:40:58.117]    AAAA_QUERY_ANALYSIS: Show AAAA queries
[2019-01-02 14:40:58.117]    MAXDBDAYS: max age for stored queries is 365 days
[2019-01-02 14:40:58.118]    RESOLVE_IPV6: Resolve IPv6 addresses
[2019-01-02 14:40:58.118]    RESOLVE_IPV4: Resolve IPv4 addresses
[2019-01-02 14:40:58.118]    DBINTERVAL: saving to DB file every minute
[2019-01-02 14:40:58.118]    DBFILE: Using /etc/pihole/pihole-FTL.db
[2019-01-02 14:40:58.118]    MAXLOGAGE: Importing up to 24.0 hours of log data
[2019-01-02 14:40:58.118]    PRIVACYLEVEL: Set to 0
[2019-01-02 14:40:58.118]    IGNORE_LOCALHOST: Show queries from localhost
[2019-01-02 14:40:58.118]    BLOCKINGMODE: Null IPs for blocked domains
[2019-01-02 14:40:58.118]    REGEX_DEBUGMODE: Inactive
[2019-01-02 14:40:58.118]    ANALYZE_ONLY_A_AND_AAAA: Disabled. Analyzing all queries
[2019-01-02 14:40:58.118]    DBIMPORT: Importing history from database
[2019-01-02 14:40:58.118]    PIDFILE: Using /var/run/pihole-FTL.pid
[2019-01-02 14:40:58.118]    PORTFILE: Using /var/run/pihole-FTL.port
[2019-01-02 14:40:58.118]    SOCKETFILE: Using /var/run/pihole/FTL.sock
[2019-01-02 14:40:58.118]    WHITELISTFILE: Using /etc/pihole/whitelist.txt
[2019-01-02 14:40:58.118]    BLACKLISTFILE: Using /etc/pihole/black.list
[2019-01-02 14:40:58.118]    GRAVITYFILE: Using /etc/pihole/gravity.list
[2019-01-02 14:40:58.118]    REGEXLISTFILE: Using /etc/pihole/regex.list
[2019-01-02 14:40:58.118]    SETUPVARSFILE: Using /etc/pihole/setupVars.conf
[2019-01-02 14:40:58.118]    AUDITLISTFILE: Using /etc/pihole/auditlog.list
[2019-01-02 14:40:58.118] Finished config file parsing
[2019-01-02 14:40:58.118] INFO: No whitelist file found
[2019-01-02 14:40:58.118] Compiled 0 Regex filters and 0 whitelisted domains in 0.0 msec (0 errors)
[2019-01-02 14:40:58.118] db_init() - Cannot open database (14): unable to open database file
[2019-01-02 14:40:58.118] Creating new (empty) database
[2019-01-02 14:40:58.154] Database successfully initialized
[2019-01-02 14:40:58.154] Imported 0 queries from the long-term database
[2019-01-02 14:40:58.154]  -> Total DNS queries: 0
[2019-01-02 14:40:58.154]  -> Cached DNS queries: 0
[2019-01-02 14:40:58.154]  -> Forwarded DNS queries: 0
[2019-01-02 14:40:58.154]  -> Exactly blocked DNS queries: 0
[2019-01-02 14:40:58.154]  -> Unknown DNS queries: 0
[2019-01-02 14:40:58.154]  -> Unique domains: 0
[2019-01-02 14:40:58.154]  -> Unique clients: 0
[2019-01-02 14:40:58.155]  -> Known forward destinations: 0
[2019-01-02 14:40:58.155] Successfully accessed setupVars.conf
[2019-01-02 14:40:58.269] PID of FTL process: 1
[2019-01-02 14:40:58.269] Listening on port 4711 for incoming IPv4 telnet connections
[2019-01-02 14:40:58.269] Listening on Unix socket

Due to the ctrl+c not getting passed correctly you may need to kill the container from another terminal when it works correctly.

The same docker run without --cap-add=NET_ADMIN has the exact same output but abruptly ends the process after Listening on Unix socket

The same can be tested interactively with: docker run -it --rm --entrypoint='bash' pihole/pihole:4.1.1 but as I mentioned when I tried running with strace I had to add privs and that fixed the issue, so maybe we need to just pass strace privileges (if they don't overlap with the fix) or run strace from outside docker on the docker run and have it follow the process forks (possible I think?)

diginc commented 5 years ago

The tail of sudo strace docker run -it --rm --entrypoint='pihole-FTL' pihole/pihole:4.1.1 no-daemon :

[2019-01-02 15:02:54.438] Creating new (empty) database
)   = 0
       futex(0x25380c8, FUTEX_WAIT, 0, NULL)   = 0
                                                  futex(0x25380c8, FUTEX_WAIT, 0, NULL[2019-01-02 15:02:54.468] Database successfully initialized
[2019-01-02 15:02:54.469] Imported 0 queries from the long-term database
[2019-01-02 15:02:54.469]  -> Total DNS queries: 0
[2019-01-02 15:02:54.469]  -> Cached DNS queries: 0
[2019-01-02 15:02:54.469]  -> Forwarded DNS queries: 0
[2019-01-02 15:02:54.469]  -> Exactly blocked DNS queries: 0
[2019-01-02 15:02:54.469]  -> Unknown DNS queries: 0
[2019-01-02 15:02:54.469]  -> Unique domains: 0
[2019-01-02 15:02:54.469]  -> Unique clients: 0
[2019-01-02 15:02:54.469]  -> Known forward destinations: 0
[2019-01-02 15:02:54.470] Successfully accessed setupVars.conf
[2019-01-02 15:02:54.589] PID of FTL process: 1
[2019-01-02 15:02:54.589] Listening on port 4711 for incoming IPv4 telnet connections
[2019-01-02 15:02:54.589] Listening on Unix socket
)   = 0
futex(0x25380c8, FUTEX_WAIT, 0, NULL <unfinished ...>
+++ exited with 5 +++

DL6ER commented 5 years ago

Pi-hole FTL v4.3 (currently still in development) will ship the capabilities check. We don't fail hard if we detect missing capabilities as we've seen systems (operating systems as well as file systems) with lack of support for capabilities in the past. Here, the fallback solution of running pihole-FTL as root until we have bound to the required ports and only then falling back to an unprivileged user works, so I don't want to destroy support for those platforms.

mhofman commented 5 years ago

I don't believe CAP_NET_ADMIN should be required, especially if the DHCP server is not in use. Even with the DHCP server enabled, injection in the arp table is not required if using the dhcp-broadcast option. Pings and CAP_NET_RAW can be skipped with the no-ping option. The only really required capability is CAP_NET_BIND_SERVICE to bind to ports lower than 1024 (if binding after root has been dropped).

I personally have been running a bare dnsmasq docker instance by doing setuid and changing capabilities manually before launching. But the best would be to enhance the checks in FTL or upstream dnsmasq.

The capabilities code in dnsmasq is here: https://github.com/pi-hole/FTL/blob/d42bcb7ab914a1d64fb8121d862c40043cc25197/dnsmasq/dnsmasq.c#L616-L630 I think the check should be gated on the features and options in use, the same way CAP_NET_BIND_SERVICE already is. No DHCP server (or dhcp-broadcast / no-ping), don't try to keep CAP_NET_ADMIN and/or CAP_NET_RAW.

Here is the Dockerfile and related files I use for my dnsmasq container: https://gist.github.com/mhofman/cdd85a6baa4b9206830b254d0ab9bb89 By changing the user before launching dnsmasq, all the checks are skipped the same way they are when running in debug mode. If dnsmasq itself was to selectively retain capabilities, CAP_NET_BIND_SERVICE wouldn't be needed to bind to the DNS port, since that's done before dropping root.

DL6ER commented 5 years ago

Thanks you very much for your detailed post. We do not modify the embedded dnsmasq so your request should be directed towards Simon on the dnsmasq-discuss mailing list if you are willing to write such a mail.

I perfectly agree with you that the capabilities should be reduced as much as possible, but, e.g., no-ping might now always be a safe option - think about an already running network where you just enabled dnsmasq as your DHCP server. It might hand out duplicated IP addresses with the no-ping option enabled until all devices have renewed their leases at least once on this server.

mhofman commented 5 years ago

Here is the email I wrote to the mailinglist: http://lists.thekelleys.org.uk/pipermail/dnsmasq-discuss/2019q1/012840.html

I agree that the no-ping option is unsafe. However this would be something users can manually configure if needed. But the NET_RAW capability is not really an issue for most people as it's included in the default docker capabilities. The problematic one is NET_ADMIN which isn't there by default, and is in my opinion too powerful of a capability to grant to most containers.

Since changes would have to come from the upstream dnsmasq, in the mean time maybe @diginc can include a workaround similar to mine in the pihole docker image? The capabilities check in the current development branch could be used to warn the user before enabling the DHCP server? And optionally set the dhcp-broadcast option if the user starts the image with the DHCP server enabled and without the NET_ADMIN capability?

DL6ER commented 5 years ago

The problem has been fixed. When running in debug mode (as opposed to no-daemon mode), we prevent dropping from root to a less privileged user. FTL v4.2.1 will set the default user to drop to to root to prevent this from happening on systems that lack proper Linux capabilities support.

mhofman commented 5 years ago

@dl6er I'm not sure I followed what is changing. Are you using dnsmasq built-in user change or are you dropping root in FTL before hitting dnsmasq's code?

If the former, dnsmasq will try and keep the net_admin and net_raw capability even if not actually required.

If the later, you could drop root while keeping only the required & available capabilities, and not have dnsmasq fail in setpcap calls.

DL6ER commented 5 years ago

@mhofman The problem is a different one - you could even understand it as being entirely separated from dnsmasq (although it isn't). I will explain this:

When we start pihole-FTL, it will start as root on those systems that don't fully support Linux capabilities (as in we cannot use the setcap command). On startup, FTL will create shared memory objects to store its data in. When dnsmasq starts, it sees UID==0 and decides to drop to an unprivileged user while keeping the capabilities as you mentioned. However, this unprivileged user - the entire pihole-FTL process now runs as - will now not be able to access the shared memory objects we created earlier and hence the process will die the next time it want to access its own memory (only root can read/write it).

The change we made is to ensure that dnsmasq will "drop" to user root to ensure that the shared memory objects stay read- and writable, see https://github.com/pi-hole/FTL/pull/497/commits/1404ea167945fcbffa80198505352a799b0d4557.

mhofman commented 5 years ago

Right, sorry I originally failed to see how this was related to running the docker image.

From what I gather, docker image is still running FTL as root, correct?

Is there any reason why the docker image doesn't call setcap with CAP_NET_BIND_SERVICE,CAP_NET_RAW,CAP_NET_ADMIN+ei as a "filesystem fix" on first launch? That should allow it to change users.

Please note that I recommend not setting the p flag since that would cause an launch failure for containers not having the extra capabilities available. Any capabilities still in the bounding set will be inherited when switching users, and will then become permitted thanks to the file capabilities.

As I wrote before a "safer" approach would be to check the config files and make sure no dnsmasq feature is in use that relies on those capabilities, but that can be a future improvement. In the mean-time you can put a warning that not all capabilities are available and this is "unsupported".

mhofman commented 5 years ago

Here is a summary of what I'm proposing:

Now:

Change docker image to do setcap CAP_NET_BIND_SERVICE,CAP_NET_RAW,CAP_NET_ADMIN+ei usr/bin/pihole-FTL on first container launch.
Change docker image to drop root before launching the pihole-FTL process. The permitted capabilities of the process will be whatever the intersection is between the container's capabilities and the 3 capabilities above. By default this will be CAP_NET_BIND_SERVICE,CAP_NET_RAW unless the user launches the container with cap_add: net_admin.
Check at runtime (either in pihole-FTL or in script) which capabilities are available, and warn user if not all 3 are.

In the future:

Improve the warning above (or make it an error) if the dnsmasq/pihole configuration uses a feature requiring a missing capability (dhcp-range without dhcp-broadcast -> need net_admin, dhcp-range without no-ping -> need net_raw, possibly others, I haven't verified all combinations)
Very far in the future, once you're really sure about which configuration causes which capabilities to be needed, drop all unnecessary capabilities.

DL6ER commented 5 years ago

Closing this issue as v4.2.1 has been released. It addresses the issue raised here. If the discussion needs to be continued, it may be done here in the closed issue ticket or rather in the surroundings of our docker repository.

pi-hole / FTL

Potential problem with no-daemon option #432