pi-hole / docker-pi-hole

Pi-hole in a docker container
https://pi-hole.net
Other
8.37k stars 1.12k forks source link

pihole/pihole:latest crashes randomly several times a day without any volumes #948

Closed ITmaze closed 2 years ago

ITmaze commented 2 years ago

This is a: Bug

Details

I'm running the latest pihole inside docker inside a dedicated docker virtual machine as I have done for several years.

The instance is launched without any volumes or storage and only the bare minimum of parameters are supplied. Until recently this has been rock solid.

Now several times a day it stops resolving and the only resolution is to kill the container and start it again.

Related Issues

The various reports invariably use some volume mounting, database storage or some other permanent storage. My container is running without storage.

How to reproduce the issue

  1. Environment data

    • Operating System: Debian 10.10
    • Hardware: VMware Fusion v12.2.1
    • Kernel Architecture: Linux docker 4.19.0-17-amd64 #1 SMP Debian 4.19.194-3 (2021-07-18) x86_64 GNU/Linux
    • Docker Install Info and version:
    • Software source: docker.io 18.09.1+dfsg1-7.1+deb10u3
    • Supplimentary Software: none
    • Hardware architecture: Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
  2. Shell command used to launch the container:

    docker run -d \
    --name pihole \
    --restart=unless-stopped \
    -p 192.168.1.120:53:53/tcp \
    -p 192.168.1.120:53:53/udp \
    -p 80:80 \
    -e TZ="Australia/Perth" \
    -e DNS1="1.1.1.1" \
    -e DNS2="1.0.0.1" \
    -e WEBPASSWORD="thisisnotmypassword" \
    pihole/pihole:latest

These common fixes didn't work for my issue

I've removed the image and the container and pulled it again, no difference.

PromoFaux commented 2 years ago

When the crashes happen, are you able to get into the container with a shell? (e.g docker exec -it pihole /bin/bash)

Ideally we would need to see some logs from around the time of the crash. Specifically /var/log/pihole-FTL.log, please. If you could also run pihole -d from within the container when it crashes - and upload a token, that would be much appreciated!

ITmaze commented 2 years ago

The logs couldn't be uploaded automatically, since DNS resolution doesn't work at the time, so here they are instead.

172.17.0.1 is the local docker0 network.

pihole-FTL.log pihole_debug.log

To update this issue, I had to restart the container, so I cannot give you more for this specific crash, but if you need more details, let me know and I'll capture them the next time it crashes.

ITmaze commented 2 years ago

Just crashed again. pihole.log had many messages like this:

Dec  3 10:28:32 dnsmasq[473]: Rate-limiting ssl.gstatic.com.Home is REFUSED (EDE: blocked)
Dec  3 10:28:32 dnsmasq[473]: query[AAAA] ssl.gstatic.com.Home from 172.17.0.1
PromoFaux commented 2 years ago

Ah, are all your queries coming from one client (i.e your router)? If not, what is the structure of your network?

A couple of recommended settings for you, which can be set as environment variables:

-e FTLCONF_REPLY_ADDR4="192.168.120.53"
-e FTLCONF_RATE_LIMIT="[tweak this to suit your needs, per docs](https://docs.pi-hole.net/ftldns/configfile/#rate_limit)"

The first is so that the Pi-hole knows the host machine's IP address, rather than the dockernet IP address.

The second, in the majority of use cases, should be ok at default values, but can easily be overwhelmed when, for examples, Pi-hole receives an entire networks queries from a single client. Let me know how you get on!

PromoFaux commented 2 years ago

Ah are all the queries coming from another container? 172.17.0.1 is the IP that is being rate limited

ITmaze commented 2 years ago

Yes. Right now I have 14 individual instances of firefox-esr running. One just for this login to update this issue for example.

That isn't unusual, in fact, that number is low, normally it would be double that and has been like that for over a year. There are several browsers used to connect to Google services, gmail, calendar, keep, drive and the like. Also not unusual.

ITmaze commented 2 years ago

In response to your network topology question, which I only just spotted.

The VM that's running docker that's running pihole is on a workstation that's running several VMs. It is physically connected to a router that serves about 40 other devices via Wi-Fi, IoT, phones, tablets, laptops and media hardware.

DHCP is served by the router that serves up the DNS as the pihole.

The docker VM has an extra interface that connects it to the LAN, so docker containers can be visible on the LAN - specifically the pihole container's web and DNS interface.

The docker VM also runs other containers that use the pihole as its DNS.

PromoFaux commented 2 years ago

That particular client that is being rate limited is making more than 1000 queries per minute, and as many as 15000 according to your debug log..!

Are you maybe applying some fuzzing to obscure your true DNS lookups? Either increase the rate limiting to something like 20000/60, or disable it entirely with 0/0.

If you're not intentionally fuzzing your queries - you may have some software acting up, or being malicious. Hopefully it does not have an external interface available to all?

ITmaze commented 2 years ago

I'll investigate, but I suspect that it's Google mail / calendar / drive etc. phoning home.

ITmaze commented 2 years ago

I've just run the debug log on the currently running instance and it uploads properly: https://tricorder.pi-hole.net/7zuTAd6W/

I've not changed anything about the browsers that are running, they've been open, idle, like they were when it died. I wonder if it's a runaway DNS call that one of the javascript apps is making that's causing the grief.

What's interesting is that it's blocking all containers and the host itself. You cannot pull another image while this rate limit is imposed. Makes me wonder if there's a way to isolate each container as a separate entity that could then reveal which one is actually causing the issue.

PromoFaux commented 2 years ago

I guess it's multiple containers all presenting as the same IP - my docker knowledge is basic at best, but might be worth looking into macvlan and giving each container it's own (real) IP address on a segment of your network?

I add something like this to each of my containers (in compose)

mac_address: d0:ca:ab:cd:ef:xx
    networks:
      home:
        ipv4_address: 192.168.1.yy

Where yy = a number from 1 to 254, and xx = that number in hex, e.g 192.168.1.254 would have mac d0:ca:ab:cd:ef:fe

home is defined as docker network create -d macvlan --subnet=192.168.0.0/23 --gateway=192.168.0.1 -o parent=eth0 home (all my containers /vms go on the 192.168.1.x subnet - real devices are on 192.168.0.x)

This may or may not be a good way to do things, but it works for me :)

ITmaze commented 2 years ago

It occurs to me that each container has an internal IP address. I'm going to investigate if I can use that to connect to the pihole internally, so we can get an IP address for the misbehaving container(s).

PromoFaux commented 2 years ago

@ITmaze were you able to get to the bottom of your issue? And was it the rate limiting that was causing the "crash" ?

ITmaze commented 2 years ago

In the time between my previous post and now I've been attempting to answer exactly that question. So far I've not got an answer. All evidence points to that the "crash" isn't an actual crash, but rate limiting of the docker host IP address.

I propose that we close this ticket and if I have more information to share, I'll add it. Potentially it will just be an update to indicate how I isolated the container that caused the error, so there is some documentation for the next person coming across this issue.

Thank you for the assistance. Feel free to re-open if you think that it's warranted.