pi-hole / FTL

The Pi-hole FTL engine
https://pi-hole.net
Other
1.37k stars 196 forks source link

DNS resolving stops during the night #1082

Closed zod1988 closed 3 years ago

zod1988 commented 3 years ago

Versions

Platform

Expected behavior

Pihole is constantly running on my Synology and was very reliable up until the last version - meaning: no reboots, no maintenance, no nothing.

Actual behavior / bug

Sometimes during the night Pihole stops resolving DNS at all. Webpages no longer load and devices can no longer connect.

If you take a look at the query log during that time you can see that no queries were processed after 03:49:45.

After you restart the resolver or reboot pihole completely everything works again for about a week.

Steps to reproduce

Steps to reproduce the behavior:

  1. Set up pihole inside a Ubuntu VM on a Synology Diskstation
  2. Leave it running

Debug Token

Screenshots

8A18D970-65E2-402D-8DDF-C0AC2A519670

Additional context

yubiuser commented 3 years ago

no queries were processed after 03:49:45.

That's the time when the weekly gravity run happend.

*** [ DIAGNOSING ]: Info table
   property              value                                   
   --------------------  ----------------------------------------
   version               13                                      
   updated               1609642194                              
   gravity_count         781748                                  
   Last gravity run finished at: So 3. Jan 03:49:54 CET 2021

[2021-01-03 01:52:49.162 3643M] Resizing "FTL-dns-cache" from 241664 to (15360 * 16) == 245760 (/dev/shm: 4.1MB used, 384.3MB total)
   [2021-01-03 03:50:00.567 3643M] Reloading DNS cache
   [2021-01-03 03:50:00.597 3643/T3647] db_set_FTL_property(1, 1609642185) called but database is not available!
   [2021-01-03 03:50:00.597 3643/T3647] db_update_counters(65, 4) called but database is not available!
   [2021-01-03 03:50:00.599 3643/T3647] parse_neighbor_cache() - Database is not available
   [2021-01-03 04:00:56.692 3643/T3649] getNameFromIP("192.168.178.59") - Database not available
   [2021-01-03 04:01:52.744 3643/T3649] getNameFromIP("192.168.178.79") - Database not available
   [2021-01-03 04:02:48.792 3643/T3649] getNameFromIP("192.168.178.62") - Database not available
   [2021-01-03 04:03:44.840 3643/T3649] getNameFromIP("192.168.178.61") - Database not available
   [2021-01-03 04:04:40.888 3643/T3649] getNameFromIP("192.168.178.65") - Database not available

Somehow the pihole-FTL.db wasn't available afterwards...

Does it also happen if you manually run pihole -g?

zod1988 commented 3 years ago

Does it also happen if you manually run pihole -g?

Does not seem to be the case, tried it twice today.

jfb-pihole commented 3 years ago

What is shown for the output of the unattended gravity update in file /var/log/pihole_updateGravity.log

zod1988 commented 3 years ago

What is shown for the output of the unattended gravity update in file /var/log/pihole_updateGravity.log

Is that part of the debug log? I have no idea how to get to this otherwise.

jfb-pihole commented 3 years ago

It is not part of the debug log, it is a separate log kept by Pi-hole to record unattended gravity updates. You can see the contents with this command:

cat /var/log/pihole_updateGravity.log

zod1988 commented 3 years ago

I can‘t see the entire file and I also cant SSH into the VM. Would an error be listed right at the end? - cause there is nothing. D1532B15-2E96-4CD4-9719-1EED701CE2D4

jfb-pihole commented 3 years ago

This section of output looks normal. There were some domains retrieved that are not shown in the earlier part of the output that is missing, but the gravity database was constructed nomrally and Pi-hole was eneabled at the end.

kfrancis commented 3 years ago

Seeing the same thing. We just updated to this version, the update worked nicely, updated the list manually as well and it was still fine but then suddenly it looks like it stopped responding.

dingopride commented 3 years ago

Also seeing this since update, multiple times daily.

https://tricorder.pi-hole.net/b6yrif75a6

image

Dashboard will say LOST CONNECTION TO API (or similar, in the colored boxes up top) in this state.

cmdshft commented 3 years ago

I, too, seem to be having issues with the DNS resolver not working randomly on it's own. And, curiously, a reboot of the whole device doesn't seem to resolve it, only manually restarting the resolver in the settings section seems to fix this. I thought it was my network at first, as my AmpliFi Alien router was reporting misconfigured DNS settings, even though they were 100% correct and I could access my Rpi via SSH and could access the admin webpage without issue.

Screenshot 2021-01-17 165209

As you can see, I lost the DNS for a lot of hours. Where you see it begin resolving again is when I manually restarted the DNS resolver.

atranchina commented 3 years ago

I'm having the same issue except I can't access the web interface or SSH when this occurs. A reboot of the Raspberry Pi fixes it every time. So far it's happened overnight and I wake up to no DNS. image

ghost commented 3 years ago

Same Issue on RPi4 since last update. Debug token is: https://tricorder.pi-hole.net/twcr522tgs

grafik

rikman122 commented 3 years ago

Same here on RPi2, reboot solves the problem for a couple of days and then it randomnly happens again. It appears to happen at night but at different hours (sometimes at 1am sometimes at 3am)

image

dschaper commented 3 years ago

Posting screenshots doesn't do much to help us help you.

dingopride commented 3 years ago

Screenshots should help you understand that this may be an issue affecting a not insignificant number of installations. People are just trying to report a frustrating issue to the best of their ability. Many of the screenshots here are accompanied by debug output. So check those out.

dschaper commented 3 years ago

Debug logs expire after 48 hours. Me Too and +1's don't really help much.

We want to fix issues and many times issues have already been fixed, but without knowing what version is being run there's not a whole lot anyone is going to be able to do.

You'll have to take extra steps to help us help you. There's less than half a dozen people that can provide assistance and they are all doing it for free.

dingopride commented 3 years ago

It's good to know that tricorder output expires after 48 hours, maybe this should be set to a more reasonable value that gives you time to retrieve them. I'm sure everyone is under the impression that the debug output will tell you all you need to know regarding versions, HW info because this is the process your documentation requests.

What I can glean from these screenshots is that this issue likely is not triggered by volume as I see many levels of activity, the one thing that I see that they all have in common is that it stops working at the top of the hour. There is probably some scheduled job that needs to bounce the resolver and isn't.

dschaper commented 3 years ago

The debug process tells you that it's 48 hours.

I'm seeing someone say that their network goes down (no SSH), I see some going down monotonic, I see some going down randomly... I don't have the time to ask each individual the same questions and pull out information from people for situations that are highly likely to be unrelated.

There is no need to bounce a process or reboot a device on schedule. The only cronjob that is run is to update the lists and that happens once a week.

dschaper commented 3 years ago

all have in common is that it stops working at the top of the hour

But they really don't. The last screenshot is in the middle of the hour, as is the light mode one two replies above it.

PromoFaux commented 3 years ago

There was another discussion about something like this somewhere else.. I'll see if I can dig it out

PromoFaux commented 3 years ago

Maybe it was this:

https://discourse.pi-hole.net/t/add-atutomatic-restart-of-pihole-ftl/44122/4

I can possibly only suggest the same as I did over there - enable additional debugging in pihole-FTL, and then wait for another crash - hopefully the more detailed logging will show us something useful.

drsnett commented 3 years ago

im having the same issue but in my its happen constantly.

debug token is: https://tricorder.pi-hole.net/xrd5fmiw0t

ddd

carcinoma commented 3 years ago

I have this problem as well. Since version 5.2.1 this behaviour is present. Version 5.1.2 is working fine. I used the non-dockerized and dockerized variant of pihole on 2 systems and have this issue.

There must be something with this new version …

ghost commented 3 years ago

I don't have the problem anymore since I uninstalled zram/log2ram. The problem occurred with log flush.

DL6ER commented 3 years ago

The problem occurred with log flush.

How did log flushing work? Was it truncating the file maybe in the middle of when we tried writing?

veloc1ty commented 3 years ago

Same here, running on a dedicated Ubuntu qemu VM. log2ram is not installed. Randomly stops resolving.

binary-person commented 3 years ago
  Pi-hole version is v5.2.4 (Latest: v5.2.4)
  AdminLTE version is v5.4 (Latest: v5.4)
  FTL version is v5.7 (Latest: v5.7)

I'm also experiencing this issue. At most once per day pihole fails to resolve (my pihole setup is hosted by DigitalOcean, so any faulty hardware issues are ruled out.)

I disabled everything that is log-related by turning the privacy level to anonymous. In the moment that it doesn't resolve, I tried running pihole flush but it failed to restart the pihole server.

It turned out that there were multiple pihole-FTL processes running for some reason and can be noted by running sudo ss -tlpnu | grep 53. My guess is pihole flush killed only one pihole-FTL process and prevented the startup of pihole-FTL since there was already another process running.

Running pkill pihole-FTL && rm/dev/shm/FTL-* && pihole flush temporarily fixed the problem, so I went ahead and added 0 */3 * * * root pihole flush || $(pkill pihole-FTL || true) && rm -rf /dev/shm/FTL-* && pihole flush as a "permanent" fix

As for why this happens, I'm speculating it has something to do with the following since it always happens around the time pihole fails to resolve:

[2021-03-03 03:06:19.674 131886/F128182] Resizing "FTL-dns-cache" from 40960 to (2816 * 16) == 45056 (/dev/shm: 1.1MB used, 514.5MB total, FTL uses 1.1MB)
[2021-03-03 03:06:20.761 128182M] Remapping "FTL-dns-cache" from 40960 to (2816 * 16) == 45056
[2021-03-03 03:16:26.119 128182M] Resizing "FTL-strings" from 53248 to (57344 * 1) == 57344 (/dev/shm: 1.1MB used, 514.5MB total, FTL uses 1.1MB)
[2021-03-03 03:16:26.331 132258/F128182] Remapping "FTL-strings" from 53248 to (57344 * 1) == 57344
vinnyperella commented 3 years ago

+1 for the issue described above, same exact thing is happening to me.

dschaper commented 3 years ago

+1 for the issue described above, same exact thing is happening to me.

And what information are you planning on providing to us to help us even begin to asses your issue?

DL6ER commented 3 years ago

As for why this happens, I'm speculating it has something to do with the following since it always happens around the time pihole fails to resolve

This may be a red herring as the log is very brief. You typically don't see anything aside from the quoted memory messages. Please check out and follow my advise on https://github.com/pi-hole/FTL/issues/1081#issuecomment-798635283 to get us some more debugging output.

Mysteriously, we have had about ten replies but nobody ever really got back to us after we asked them to enable debug logging... Hence, this issues is still open. Sorry for the inconvenience!

PS: I'll move this ticket over to the FTL namespace as this is where the bug is sitting.

KevinVan9 commented 3 years ago

I'm having the same issue except I can't access the web interface or SSH when this occurs. A reboot of the Raspberry Pi fixes it every time. So far it's happened overnight and I wake up to no DNS.

I have the same problem. I've reinstalled, reflashed OS(Dietpi, Raspbian Lite) multiple times and that has not helped.

Versions Pi-hole version is v5.2.4 (Latest: v5.2.4) AdminLTE version is v5.4 (Latest: v5.4) FTL version is v5.7 (Latest: v5.7)

Hardware Raspberry Pi Zero Wireless

A snapshot of the logs:

[2021-03-21 11:15:22.904 28882/T28886] Compiled 0 whitelist and 4 blacklist regex filters for 3 clients in 20.1 msec [2021-03-21 11:19:52.613 28882M] Resizing "FTL-strings" from 4096 to (8192 1) == 8192 (/dev/shm: 1.1MB used, 226.4MB total, FTL uses 1.1MB) [2021-03-21 12:05:09.130 28882M] Resizing "FTL-dns-cache" from 4096 to (512 16) == 8192 (/dev/shm: 1.1MB used, 226.4MB total, FTL uses 1.1MB) [2021-03-21 12:06:54.229 28882M] Resizing "FTL-strings" from 8192 to (12288 1) == 12288 (/dev/shm: 1.1MB used, 226.4MB total, FTL uses 1.1MB) [2021-03-21 12:23:11.330 28882M] Received: Real-time signal 0 (34 -> 0) [2021-03-21 12:23:11.415 28882/T28886] Compiled 0 whitelist and 4 blacklist regex filters for 3 clients in 20.7 msec [2021-03-21 14:20:41.692 28882M] Resizing "FTL-strings" from 12288 to (16384 1) == 16384 (/dev/shm: 1.1MB used, 226.4MB total, FTL uses 1.1MB) [2021-03-21 15:19:33.576 28882M] Resizing "FTL-dns-cache" from 8192 to (768 * 16) == 12288 (/dev/shm: 1.1MB used, 226.4MB total, FTL uses 1.1MB) [2021-03-21 19:34:29.503 558M] ########## FTL started! ##########

raspi

I may try setting verbose logs soon

DL6ER commented 3 years ago

@KevinVan9 Yours looks differently because your Pi-hole seems to be still working, just doesn't receive (many) queries. For the others, resolving stops completely, however, your Pi-hole still shows a few queries around 16;00, 17:00, 19:45, etc. Did you maybe configure more than one DNS server in your router and your client(s) just chose to pick the other one?

KevinVan9 commented 3 years ago

@DL6ER Those were from localhost to .in-addr.arpa and api.github.com My devices' besides the Pi's queries could not get through at those times(pages would not load so I changed the DNS on those devices until I power cycled the Pi at night). I could not connect to the Pihole web admin page and could not ssh into the Pi as well.

DL6ER commented 3 years ago

I could not connect to the Pihole web admin page and could not ssh into the Pi as well.

Not via pi.hole but also not via the IP address itself?

KevinVan9 commented 3 years ago

I could not connect to the Pihole web admin page and could not ssh into the Pi as well.

Not via pi.hole but also not via the IP address itself?

pi.hole does not work (previously never tried but it does not work right now). I always use IP address to ssh and access web admin page.

Edit: It appears that devices lose pihole at different times.

Edit: I found https://discourse.pi-hole.net/t/pi-hole-disconnects-after-a-few-hours-on-certain-devices/15128 And it appears that it may be the router. I switched from 802.11bgn to 802.11n for 2.4GHz and that allowed my disconnected devices to regain full connection without doing a powercycle on my Pi. Will update again if issues reoccur.

Edit: That was a temporary fix

DL6ER commented 3 years ago

The next version of FTL has been released. Please update and see if the issue persists. We had quite a few bug fixes and maybe yours was a side-effect of another one. Never give up hope ;-)

KevinVan9 commented 3 years ago

Unfortunately the problem is still present. My devices eventually lose connection to the pi via ssh, ping, admin page. It's not simultaneous and the pi cannot ping the devices that lose connection too.

DL6ER commented 3 years ago

I always use IP address to ssh and access web admin page.

If you are using the IP address and your Pi-hole isn't reachable that something is really going wrong on the operating system-level. Did you configure a static IP address? Can you connect a screen + keyboard to your Pi to check if it still works but somehow lost connection or got a new IP address when it stops being reachable form the outside?

carlangas159 commented 3 years ago

I recently updated my version:

Pi-hole v5.3.1 Web Interface v5.5 FTL v5.8.1

I have noticed that in the first hour of the day, it does not resolve any address (they go 2 days in a row), I must restart the service, what it does, lose the data.

Outside of this issue it would be that it no longer resolves my local names.

0schr0eder commented 3 years ago

pihole FTL_1h_before_until_restart.zip

I captured the issue in debug mode. Log starts 1h before the event until the restart. Do you need my configuration (teleporter)?

DL6ER commented 3 years ago

@0schr0eder I checked your log and it really looks like nothing was wrong. The last query arrived at 2021-05-26 23:34:06.196. Here, the log stops for a moment until it was shut down (?) on 2021-05-27 05:45:03.839. The question mark is because I do not see any indication for shut down so FTL was maybe killed. How did you restart?

0schr0eder commented 3 years ago

I have to pull the plug. At that point the pi doesn't respond to SSH or the pi-hole UI. The LED on the network port shows traffic but there is no action around the SD card LED.

binary-person commented 3 years ago

if it doesn't respond to ssh, I think that's another problem not related to pihole. Perhaps one of the processes is hanging your system?

0schr0eder commented 3 years ago

Which log should I look at? image

0schr0eder commented 3 years ago

Do you want to see the log since the reboot?

0schr0eder commented 3 years ago

Found this in kern.log.1:

May 26 23:33:34 raspberrypi kernel: [    8.156056] brcmfmac mmc1:0001:1: Direct firmware load for brcm/brcmfmac43430-sdio.raspberrypi,3-model-b.txt failed with error -2
May 26 23:33:34 raspberrypi kernel: [    8.388786] brcmfmac: brcmf_fw_alloc_request: using brcm/brcmfmac43430-sdio for chip BCM43430/1
May 26 23:33:34 raspberrypi kernel: [    8.390515] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
May 26 23:33:34 raspberrypi kernel: [    8.391483] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43430/1 wl0: Oct 22 2019 01:59:28 version 7.45.98.94 (r723000 CY) FWID 01-3b33decd

Researching this leads me to this which in turn shows the files are here

I guess it's worth a shot copying the files to my pi? Though the pi is wired..... Any thoughts on this?

pablopoo commented 3 years ago

I have one pihole instance (of 2) running on a Pi with a dietpi OS. That instance show no dns queries between 21:00 and 00:00. I found on the syslog a scheduled update at 21:00:

pihole updatechecker local

Then at 00:00 found the following commands, that restore the statistics:

pihole flush once quiet pihole updatechecker local

I don't know yet if it's only the log or the dns that is not working. I will try it between that hours.

DL6ER commented 3 years ago

@pablopoo Any new information for us? If so, please open a new issue ticket as you issue does not seem to be related to the current discussion at all.

pablopoo commented 3 years ago

@DL6ER yes, seems it's unrelated to this issue but with the same result, no dns log between specific timeframe. For me could be related to something in the dietpi distro. I saw one dietpi script cleaning the log at the same time pi-hole was rotating the log. I will open a new ticket if I found the root cause of my issue.

dschaper commented 3 years ago

@MichaIng Do you have any tips.