Closed somera closed 1 year ago
This issue has been mentioned on Pi-hole Userspace. There might be relevant details there:
This issue has been mentioned on Pi-hole Userspace. There might be relevant details there:
I'm experiencing crashes too:
[2022-11-17 08:16:54.820 10547/T10558] Compiled 0 whitelist and 0 blacklist regex filters for 63 clients in 0.4 msec
[2022-11-17 08:16:54.820 10547/T10558] Blocking status is enabled
[2022-11-17 08:17:41.554 10547M] Resizing "FTL-dns-cache" from 4096 to (512 * 16) == 8192 (/dev/shm: 10.3MB used, 969.5MB total, FTL uses 10.3MB)
[2022-11-17 08:21:11.231 10547M] Resizing "FTL-dns-cache" from 8192 to (768 * 16) == 12288 (/dev/shm: 10.3MB used, 969.5MB total, FTL uses 10.3MB)
[2022-11-17 08:30:31.296 10547M] Resizing "FTL-dns-cache" from 12288 to (1024 * 16) == 16384 (/dev/shm: 10.3MB used, 969.5MB total, FTL uses 10.3MB)
[2022-11-17 08:34:59.868 10547M] Resizing "FTL-dns-cache" from 16384 to (1280 * 16) == 20480 (/dev/shm: 10.3MB used, 969.5MB total, FTL uses 10.3MB)
[2022-11-17 08:42:48.511 10547M] Resizing "FTL-dns-cache" from 20480 to (1536 * 16) == 24576 (/dev/shm: 10.3MB used, 969.5MB total, FTL uses 10.3MB)
[2022-11-17 08:57:02.795 10547M] Resizing "FTL-dns-cache" from 24576 to (1792 * 16) == 28672 (/dev/shm: 10.3MB used, 969.5MB total, FTL uses 10.3MB)
[2022-11-17 09:14:58.603 10547M] Resizing "FTL-dns-cache" from 28672 to (2048 * 16) == 32768 (/dev/shm: 10.4MB used, 969.5MB total, FTL uses 10.3MB)
[2022-11-17 09:42:03.955 10547M] Resizing "FTL-dns-cache" from 32768 to (2304 * 16) == 36864 (/dev/shm: 10.4MB used, 969.5MB total, FTL uses 10.3MB)
[2022-11-17 10:14:38.057 10547M] Resizing "FTL-dns-cache" from 36864 to (2560 * 16) == 40960 (/dev/shm: 10.4MB used, 969.5MB total, FTL uses 10.3MB)
[2022-11-17 11:12:24.968 10547M] Resizing "FTL-dns-cache" from 40960 to (2816 * 16) == 45056 (/dev/shm: 10.4MB used, 969.5MB total, FTL uses 10.4MB)
[2022-11-17 11:21:00.235 13350/F10547] !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
[2022-11-17 11:21:00.235 13350/F10547] ----------------------------> FTL crashed! <----------------------------
[2022-11-17 11:21:00.235 13350/F10547] !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
[2022-11-17 11:21:00.235 13350/F10547] Please report a bug at https://github.com/pi-hole/FTL/issues
[2022-11-17 11:21:00.235 13350/F10547] and include in your report already the following details:
[2022-11-17 11:21:00.235 13350/F10547] FTL has been running for 11054 seconds
[2022-11-17 11:21:00.235 13350/F10547] FTL branch: master
[2022-11-17 11:21:00.236 13350/F10547] FTL version: v5.19.1
[2022-11-17 11:21:00.236 13350/F10547] FTL commit: b48b3e1f
[2022-11-17 11:21:00.236 13350/F10547] FTL date: 2022-11-14 22:01:50 +0000
[2022-11-17 11:21:00.236 13350/F10547] FTL user: started as pihole, ended as pihole
[2022-11-17 11:21:00.236 13350/F10547] Compiled for aarch64 (compiled on CI) using aarch64-linux-gnu-gcc (Debian 8.3.0-2) 8.3.0
[2022-11-17 11:21:00.236 13350/F10547] Process details: MID: 10547
[2022-11-17 11:21:00.236 13350/F10547] PID: 13350
[2022-11-17 11:21:00.236 13350/F10547] TID: 13350
[2022-11-17 11:21:00.236 13350/F10547] Name: pihole-FTL
[2022-11-17 11:21:00.237 13350/F10547] Received signal: Segmentation fault
[2022-11-17 11:21:00.237 13350/F10547] at address: 0x7fb5abe008
[2022-11-17 11:21:00.237 13350/F10547] with code: SEGV_MAPERR (Address not mapped to object)
[2022-11-17 11:21:00.241 13350/F10547] Backtrace:
[2022-11-17 11:21:00.243 13350/F10547] B[0000]: /usr/bin/pihole-FTL(generate_backtrace+0x38) [0x5588ddb300]
[2022-11-17 11:21:00.544 13350/F10547] L[0000]: /__w/FTL/FTL/src/signals.c:98
[2022-11-17 11:21:00.549 13350/F10547] B[0001]: /usr/bin/pihole-FTL(+0x5b7d4) [0x5588ddb7d4]
[2022-11-17 11:21:00.591 13350/F10547] L[0001]: /__w/FTL/FTL/src/signals.c:242
[2022-11-17 11:21:00.596 13350/F10547] B[0002]: linux-vdso.so.1(__kernel_rt_sigreturn+0) [0x7fb5ac8788]
[2022-11-17 11:21:00.596 13350/F10547] B[0003]: /usr/bin/pihole-FTL(_lock_shm+0x88) [0x5588dda338]
[2022-11-17 11:21:00.632 13350/F10547] L[0003]: /__w/FTL/FTL/src/shmem.c:420 (discriminator 1)
[2022-11-17 11:21:00.636 13350/F10547] B[0004]: /usr/bin/pihole-FTL(_FTL_new_query+0x2b8) [0x5588dccff8]
[2022-11-17 11:21:00.674 13350/F10547] L[0004]: /__w/FTL/FTL/src/dnsmasq_interface.c:628
[2022-11-17 11:21:00.678 13350/F10547] B[0005]: /usr/bin/pihole-FTL(tcp_request+0x6a4) [0x5588e0ca4c]
[2022-11-17 11:21:00.721 13350/F10547] L[0005]: /__w/FTL/FTL/src/dnsmasq/forward.c:2314 (discriminator 4)
[2022-11-17 11:21:00.726 13350/F10547] B[0006]: /usr/bin/pihole-FTL(+0x7fa7c) [0x5588dffa7c]
[2022-11-17 11:21:00.768 13350/F10547] L[0006]: /__w/FTL/FTL/src/dnsmasq/dnsmasq.c:2053
[2022-11-17 11:21:00.773 13350/F10547] B[0007]: /usr/bin/pihole-FTL(main_dnsmasq+0xfc0) [0x5588e01598]
[2022-11-17 11:21:00.811 13350/F10547] L[0007]: /__w/FTL/FTL/src/dnsmasq/dnsmasq.c:1278
[2022-11-17 11:21:00.814 13350/F10547] B[0008]: /usr/bin/pihole-FTL(main+0x100) [0x5588dc14b0]
[2022-11-17 11:21:00.839 13350/F10547] L[0008]: /__w/FTL/FTL/src/main.c:118
[2022-11-17 11:21:00.842 13350/F10547] B[0009]: /lib/aarch64-linux-gnu/libc.so.6(__libc_start_main+0xe8) [0x7fb5842e18]
[2022-11-17 11:21:00.842 13350/F10547] B[0010]: /usr/bin/pihole-FTL(+0x41978) [0x5588dc1978]
[2022-11-17 11:21:00.888 13350/F10547] L[0010]: ??:?
[2022-11-17 11:21:00.895 13350/F10547] ------ Listing content of directory /dev/shm ------
[2022-11-17 11:21:00.895 13350/F10547] File Mode User:Group Size Filename
[2022-11-17 11:21:00.895 13350/F10547] rwxrwxrwx root:root 260 .
[2022-11-17 11:21:00.895 13350/F10547] rwxr-xr-x root:root 4K ..
[2022-11-17 11:21:00.896 13350/F10547] rw------- pihole:pihole 4K FTL-per-client-regex
[2022-11-17 11:21:00.896 13350/F10547] rw------- pihole:pihole 45K FTL-dns-cache
[2022-11-17 11:21:00.896 13350/F10547] rw------- pihole:pihole 8K FTL-overTime
[2022-11-17 11:21:00.896 13350/F10547] rw------- pihole:pihole 10M FTL-queries
[2022-11-17 11:21:00.896 13350/F10547] rw------- pihole:pihole 315K FTL-upstreams
[2022-11-17 11:21:00.896 13350/F10547] rw------- pihole:pihole 86K FTL-clients
[2022-11-17 11:21:00.896 13350/F10547] rw------- pihole:pihole 98K FTL-domains
[2022-11-17 11:21:00.896 13350/F10547] rw------- pihole:pihole 164K FTL-strings
[2022-11-17 11:21:00.897 13350/F10547] rw------- pihole:pihole 16 FTL-settings
[2022-11-17 11:21:00.897 13350/F10547] rw------- pihole:pihole 248 FTL-counters
[2022-11-17 11:21:00.897 13350/F10547] rw------- pihole:pihole 104 FTL-lock
[2022-11-17 11:21:00.897 13350/F10547] ---------------------------------------------------
[2022-11-17 11:21:00.897 13350/F10547] Please also include some lines from above the !!!!!!!!! header.
[2022-11-17 11:21:00.897 13350/F10547] Thank you for helping us to improve our FTL engine!
[2022-11-17 11:21:00.897 13350/F10547] Asking parent pihole-FTL (PID 10547) to shut down
[2022-11-17 11:21:00.897 10547M] Received: Real-time signal 2 (36 -> 2)
[2022-11-17 11:21:00.897 13350/F10547] FTL fork terminated!
[2022-11-17 11:21:00.898 10547/T10558] Error when obtaining outer SHM lock: Owner died
[2022-11-17 11:21:00.900 13349/F10547] TCP worker already terminating!
[2022-11-17 11:21:00.900 10547/T10558] Error when obtaining outer SHM lock: Owner died
[2022-11-17 11:21:00.900 10547/T10558] Error when obtaining inner SHM lock: Owner died
[2022-11-17 11:21:00.901 10547M] Shutting down...
[2022-11-17 11:21:01.228 10547M] Finished final database update (stored 1 queries)
[2022-11-17 11:21:01.228 10547M] Waiting for threads to join
[2022-11-17 11:21:01.228 10547M] Thread database (0) is idle, terminating it.
[2022-11-17 11:21:01.229 10547M] Thread housekeeper (1) is idle, terminating it.
[2022-11-17 11:21:01.229 10547M] Thread DNS client (2) is idle, terminating it.
[2022-11-17 11:21:01.230 10547M] All threads joined
[2022-11-17 11:21:01.230 10547M] Joining API worker thread 0
[2022-11-17 11:21:01.230 10547M] Joining API worker thread 1
[2022-11-17 11:21:01.230 10547M] Joining API worker thread 2
[2022-11-17 11:21:01.230 10547M] Joining API worker thread 3
[2022-11-17 11:21:01.230 10547M] Joining API worker thread 4
[2022-11-17 11:21:01.241 10547M] ########## FTL terminated after 3h 4m 14s (code 1)! ##########
[2022-11-17 11:49:38.293 13640M] Using log file /var/log/pihole/FTL.log
[2022-11-17 11:49:38.293 13640M] ########## FTL started on dns1! ##########
[2022-11-17 11:49:38.293 13640M] FTL branch: master
[2022-11-17 11:49:38.293 13640M] FTL version: v5.19.1
[2022-11-17 11:49:38.293 13640M] FTL commit: b48b3e1f
[2022-11-17 11:49:38.293 13640M] FTL date: 2022-11-14 22:01:50 +0000
[2022-11-17 11:49:38.293 13640M] FTL user: pihole
System: Raspberry Pi 4 / 2GB, static IP.
If you need more information please let me know, I'll try to provide.
I also got an error rolling back to 5.18.2:
pihole checkout ftl v5.18.2
Please note that changing branches severely alters your Pi-hole subsystems
Features that work on the master branch, may not on a development branch
This feature is NOT supported unless a Pi-hole developer explicitly asks!
Have you read and understood this? [y/N] y
[✓] Branch v5.18.2 exists
[i] Switching to branch: "v5.18.2" from "master"
[✓] Downloading and Installing FTL
[✓] Restarting pihole-FTL service...
[✓] Enabling pihole-FTL service to start on reboot...
sed: -e expression #1, char 63: unknown command: `C'
I also got an error rolling back to 5.18.2:
same here
I also got an error rolling back to 5.18.2:
me too. On the Pi-Hole UI the version 5.18.2 is displayed and seams to be running OK. Maybe the error is not relevant.
Seems we are getting no new crashes for anyone having switched to the proposed bugfix branch. At least there seems to be nothing in the past 15 hours it's there. I'd say we're good to push a hotfix release. In the somewhat unlikely case we find another issue, we can push yet another hotfix...
Individual replies:
@cathalferris Thank you for the log. Indeed the last 200 lines would have been sufficient but now we have learned something.
@markdall It very much depends on both fairly precise timing of the incoming queries and the kind of queries in your network (you need to have a sufficient number of clients doing TCP requests which is not the standard). It's very likely that > 99% of all Pi-hole users would have never seen it. I'm sorry you all in here had been affected.
@RobThree @PedroMartinSteenstrup This is a bug in https://github.com/pi-hole/pi-hole, we'll have to fix it there.
What happened here? The FTL main process has mapped that shared memory segment to a specific location in memory. FTL then forked twice. One of the forks re-maps the shared memory to a new address but neither the other fork cannot be made aware of this. When it tries to access the memory at the old location, everything explodes and we get the SEGV_MAPERR (Address not mapped to object)
crash. I do understand this now, however, it's kind of counterintuitive as forks are meant to be seen as entirely separate processes.
We're seemingly hit by side-effect of copy-on-write here as the second fork is still sharing the memory with the main process, however, the main process abandoned the memory silently and copy-on-write didn't back it up for the fork resulting in the memory being gone without the fork having a chance to notice it. I'd call it a bug in the kernel but I'm pretty sure this is just one of the quadrillion side-effects mmap
can have and I'm sure it is documented somewhere ;-)
As I was already expecting something like this (even when not exactly in this form), the proposed fix on the branch already solves this by doing it very differently.
Thanks for the quick response!
FWIW - also having the FTL crash here on two of my 3 RPi's. Both ones are RPi4B's and are my main Pi-hole's (DNS1/DNS2) DNS setup). Approx. 110 active clients, ~250k queries every 24hrs. FTL would crash every hour or two following the update. Roll back to previous FTL version has been fine since. Looking forward to the hotfix coming out as noted above. Thanks to the team for the quick work!
@DL6ER thx for the fix!
Fix deployed to my Pi-holes - will monitor during the day. Thanks team!
Everyone using a custom branch - please go back to master by
pihole checkout master
Confirming no further crashes since the updated fix last night. I was away from the network for the day until now, hence the later feedback.
Again - apologies for the large log file. ( suggestion, have a progress bar or other feedback about the status of the tricorder upload? )
I've moved back to the main branch now - thank you all for your work on this, much much appreciated.. Also - thanks for the clear reasoning behind the bug and overall really good communications.
Also, glad to have been of assistance.
-Cathal.
Never got any crashes on the fix/forked_shmSettings
branch with debug flags on, but switching back to master
now. Thanks @DL6ER for the fix and for sharing all the details!
Have been running the fixes for 6 hours now on all three Pi-holes and no issues to report. FTL was crashing hourly on the previous version so this is great to see. Happy to be back on latest release code. Many thanks again!
This issue has been mentioned on Pi-hole Userspace. There might be relevant details there:
https://discourse.pi-hole.net/t/dnsmasq-segterm-and-pi-hole-web-interface-freezes/59336/2
This issue has been mentioned on Pi-hole Userspace. There might be relevant details there:
Versions
Platform
Actual behavior / bug