Closed schuettecarsten closed 3 days ago
Could you please run pihole-FTL
connected to a debugger so we can find out where inside pihole-FTL
this crash happens?
https://deploy-preview-338--pihole-docs.netlify.app/ftldns/debugging/
has all the details incl. a special section how to do it inside a docker
container.
I can see the same happening both for nightly and development tags.
This is what is shown after the mentioned debugging procedure inside the docker container:
Thread 1 "pihole-FTL" received signal SIG37, Real-time event 37.
__cp_end () at src/thread/x86_64/syscall_cp.s:29
29 in src/thread/x86_64/syscall_cp.s
(gdb) backtrace
#0 __cp_end () at src/thread/x86_64/syscall_cp.s:29
#1 0x00000000007f19e3 in __syscall_cp_c (nr=7, u=<optimized out>, v=<optimized out>, w=<optimized out>, x=<optimized out>,
y=<optimized out>, z=0) at src/thread/pthread_cancel.c:33
#2 0x00000000007ea619 in poll (fds=<optimized out>, n=<optimized out>, timeout=timeout@entry=-1) at src/select/poll.c:9
#3 0x00000000004e9762 in poll (__s=-1, __n=<optimized out>, __f=<optimized out>) at /usr/include/fortify/poll.h:40
#4 0x00000000004bafbc in main_dnsmasq (argc=<optimized out>, argv=<optimized out>) at /app/src/dnsmasq/dnsmasq.c:1183
#5 0x00000000004020d7 in main (argc=<optimized out>, argv=0x7ffe637f30e8) at /app/src/main.c:123
It doesn't look very helpful.
Seems, you need to need to add SIG37 nostop
similar to the other signals in the documentation to skip this useless signal. You are perfectly right about it not being useful.
I have another with
Thread 4 "housekeeper" received signal SIG33, Real-time event 33.
[Switching to LWP 113]
__cp_end () at src/thread/x86_64/syscall_cp.s:29
29 in src/thread/x86_64/syscall_cp.s
(gdb) backtrace
#0 __cp_end () at src/thread/x86_64/syscall_cp.s:29
#1 0x00000000007f19e3 in __syscall_cp_c (nr=23, u=<optimized out>, v=<optimized out>, w=<optimized out>, x=<optimized out>,
y=<optimized out>, z=0) at src/thread/pthread_cancel.c:33
#2 0x00000000007ea700 in select (n=n@entry=0, rfds=rfds@entry=0x0, wfds=wfds@entry=0x0, efds=efds@entry=0x0, tv=tv@entry=0x734f1354ba00)
at src/select/select.c:39
#3 0x00000000006c0e29 in FTLselect (nfds=nfds@entry=0, readfds=readfds@entry=0x0, writefds=writefds@entry=0x0,
exceptfds=exceptfds@entry=0x0, timeout=timeout@entry=0x734f1354ba00, file=file@entry=0x80069f "/app/src/timers.c",
func=0x867d40 <__FUNCTION__.0> "sleepms", line=65) at /app/src/syscalls/select.c:25
#4 0x000000000042695a in sleepms (milliseconds=<optimized out>) at /app/src/timers.c:65
#5 0x0000000000426661 in thread_sleepms (thread=thread@entry=GC, milliseconds=<optimized out>) at /app/src/signals.c:496
#6 0x0000000000418357 in GC_thread (val=<optimized out>) at /app/src/gc.c:645
#7 0x00000000007f26d0 in start (p=0x734f1354bb00) at src/thread/pthread_create.c:207
#8 0x00000000007f3d5c in __clone () at src/thread/x86_64/clone.s:22
Backtrace stopped: frame did not save the PC
Okay, so SIG33
is another one for the nostop
here. Sorry about that. What we are looking for is a SIGSEGV
This issue has been mentioned on Pi-hole Userspace. There might be relevant details there:
https://discourse.pi-hole.net/t/pi-hole-stopped-responding/73931/4
Thanks for your patience. However, when also adding SIG33 nostop
, gdb doesn't stop at all:
[Detaching after fork from child process 197]
Thread 1 "pihole-FTL" received signal SIG37, Real-time event 37.
Thread 1 "pihole-FTL" received signal SIGTERM, Terminated.
Thread 1 "pihole-FTL" received signal SIG41, Real-time event 41.
[LWP 136 exited]
[LWP 139 exited]
Thread 2 "ntp-client" received signal SIG33, Real-time event 33.
Thread 4 "housekeeper" received signal SIG33, Real-time event 33.
Thread 5 "dns-client" received signal SIG33, Real-time event 33.
[LWP 137 exited]
[LWP 138 exited]
[LWP 135 exited]
[LWP 141 exited]
[LWP 140 exited]
[Inferior 1 (process 48) exited with code 01]
This means there was no crash. Why FTL exited with exit code 01 will be found in /var/log/Pi-hole/FTL.log
. Unfortunately, the debugger cannot see this.
This is what the log contains:
2024-11-11 10:02:01.564 CET [3270/F52] INFO: !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
2024-11-11 10:02:01.564 CET [3270/F52] INFO: ----------------------------> FTL crashed! <----------------------------
2024-11-11 10:02:01.564 CET [3270/F52] INFO: !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
2024-11-11 10:02:01.564 CET [3270/F52] INFO: Please report a bug at https://github.com/pi-hole/FTL/issues
2024-11-11 10:02:01.564 CET [3270/F52] INFO: and include in your report already the following details:
2024-11-11 10:02:01.564 CET [3270/F52] INFO: FTL has been running for 4195 seconds
2024-11-11 10:02:01.564 CET [3270/F52] INFO: FTL branch: development
2024-11-11 10:02:01.565 CET [3270/F52] INFO: FTL version: vDev-848367f
2024-11-11 10:02:01.565 CET [3270/F52] INFO: FTL commit: 848367fa
2024-11-11 10:02:01.565 CET [3270/F52] INFO: FTL date: 2024-11-08 19:13:07 +0100
2024-11-11 10:02:01.565 CET [3270/F52] INFO: FTL user: started as pihole, ended as pihole
2024-11-11 10:02:01.565 CET [3270/F52] INFO: Compiled for linux/amd64 (compiled on CI) using cc (Alpine 13.2.1_git20240309) 13.2.1 20240309
2024-11-11 10:02:01.565 CET [3270/F52] INFO: Process details: MID: 52
2024-11-11 10:02:01.565 CET [3270/F52] INFO: PID: 3270
2024-11-11 10:02:01.565 CET [3270/F52] INFO: TID: 3270
2024-11-11 10:02:01.565 CET [3270/F52] INFO: Name: pihole-FTL
2024-11-11 10:02:01.565 CET [3270/F52] INFO: Received signal: Segmentation fault
2024-11-11 10:02:01.565 CET [3270/F52] INFO: at address: 0x100e00003c08
2024-11-11 10:02:01.565 CET [3270/F52] INFO: with code: SEGV_MAPERR (Address not mapped to object)
2024-11-11 10:02:01.565 CET [3270/F52] INFO: !!! INFO: pihole-FTL has not been compiled with glibc/backtrace support, not generating one !!!
2024-11-11 10:02:01.565 CET [3270/F52] INFO: ------ Listing content of directory /dev/shm ------
2024-11-11 10:02:01.565 CET [3270/F52] INFO: File Mode User:Group Size Filename
2024-11-11 10:02:01.565 CET [3270/F52] INFO: rwxrwxrwx root:root 360 .
2024-11-11 10:02:01.565 CET [3270/F52] INFO: rwxr-xr-x root:root 340 ..
2024-11-11 10:02:01.565 CET [3270/F52] INFO: rw------- pihole:pihole 88 FTL-lock
2024-11-11 10:02:01.565 CET [3270/F52] INFO: rw------- pihole:pihole 328 FTL-counters
2024-11-11 10:02:01.566 CET [3270/F52] INFO: rw------- pihole:pihole 136 FTL-settings
2024-11-11 10:02:01.566 CET [3270/F52] INFO: rw------- pihole:pihole 123K FTL-strings
2024-11-11 10:02:01.566 CET [3270/F52] INFO: rw------- pihole:pihole 78K FTL-domains
2024-11-11 10:02:01.566 CET [3270/F52] INFO: rw------- pihole:pihole 696K FTL-clients
2024-11-11 10:02:01.566 CET [3270/F52] INFO: rw------- pihole:pihole 29K FTL-upstreams
2024-11-11 10:02:01.566 CET [3270/F52] INFO: rw------- pihole:pihole 5M FTL-queries
2024-11-11 10:02:01.566 CET [3270/F52] INFO: rw------- pihole:pihole 8K FTL-overTime
2024-11-11 10:02:01.566 CET [3270/F52] INFO: rw------- pihole:pihole 123K FTL-dns-cache
2024-11-11 10:02:01.566 CET [3270/F52] INFO: rw------- pihole:pihole 8K FTL-per-client-regex
2024-11-11 10:02:01.566 CET [3270/F52] INFO: rw------- pihole:pihole 569K FTL-fifo-log
2024-11-11 10:02:01.566 CET [3270/F52] INFO: rw------- pihole:pihole 8K FTL-clients-lookup
2024-11-11 10:02:01.566 CET [3270/F52] INFO: rw------- pihole:pihole 20K FTL-domains-lookup
2024-11-11 10:02:01.566 CET [3270/F52] INFO: rw------- pihole:pihole 16K FTL-dns-cache-lookup
2024-11-11 10:02:01.566 CET [3270/F52] INFO: rw------- pihole:pihole 786K FTL-recycler
2024-11-11 10:02:01.566 CET [3270/F52] INFO: ---------------------------------------------------
2024-11-11 10:02:01.566 CET [3270/F52] INFO: Please also include some lines from above the !!!!!!!!! header.
2024-11-11 10:02:01.567 CET [3270/F52] INFO: Thank you for helping us to improve our FTL engine!
2024-11-11 10:02:01.567 CET [3270/F52] INFO: Asking parent pihole-FTL (PID 52) to shut down
2024-11-11 10:02:01.567 CET [3270/F52] INFO: FTL fork terminated!
2024-11-11 10:02:01.567 CET [52M] INFO: Received: RT37 (37 -> 2)
2024-11-11 10:02:01.570 CET [52M] INFO: Asked to terminate by "/usr/bin/pihole-FTL no-daemon" (PID 52, user pihole UID 100)
2024-11-11 10:02:01.594 CET [52/T117] INFO: Terminating database thread
2024-11-11 10:02:01.611 CET [52/T120] INFO: Terminating timer thread
2024-11-11 10:02:01.977 CET [52/T119] INFO: Terminating resolver thread
2024-11-11 10:02:02.009 CET [52M] INFO: Finished final database update
2024-11-11 10:02:02.009 CET [52M] INFO: Waiting for threads to join
2024-11-11 10:02:02.009 CET [52M] INFO: Thread housekeeper (1) is idle, terminating it.
2024-11-11 10:02:02.009 CET [52M] INFO: Thread ntp-client (4) is idle, terminating it.
2024-11-11 10:02:02.009 CET [52M] INFO: All threads joined
2024-11-11 10:02:02.018 CET [52M] INFO: Stored 0 API sessions in the database
2024-11-11 10:02:02.682 CET [52M] INFO: ########## FTL terminated after 1h 9m 56s (code 1)! ##########
Note, that this happens on the current development image, as well as yesterday's and today's nightly image. The log above is from the latest nightly because while it crashes, FTL recovers quicker while development does not and sometimes takes down the container.
3270/F52
from e.g. the first line says the crash happened in process with PID 3270 which is a Fork of process 52. Unfortunately, gdb
cannot really trace forks, see https://sourceware.org/gdb/current/onlinedocs/gdb.html/Forks.html
On most systems, GDB has no special support for debugging programs which create additional processes using the fork function. When a program forks, GDB will continue to debug the parent process and the child process will run unimpeded. [...]
´The first crash you have reported here had the crash in 52/T158
which is the original processes' thread with ID 158. gdb
would have caught another crash therein.
According to the link above, Linux supports setting
set follow-fork-mode child
allowing gdb
to follow the forked child. But if FTL crashes in the main process in between, we will not have caught this...
edit Clarified above and suggested a different command to run
set detach-on-fork off
might work, too, but I am not sure if suspending the main process is still a healthy system. It may have other consequences. Easiest would probably be waiting until we get a crash not in a fork so gdb
can provide all the necessary information for us.
With set detach-on-fork off
(gdb) set detach-on-fork off
(gdb) continue
Continuing.
[New inferior 2 (process 2592)]
Can not resume the parent process over vfork in the foreground while
holding the child stopped. Try "set detach-on-fork" or "set schedule-multiple".
[Switching to LWP 153]
__clone () at src/thread/x86_64/clone.s:18
warning: 18 src/thread/x86_64/clone.s: No such file or directory
(gdb) backtrace
#0 __clone () at src/thread/x86_64/clone.s:18
#1 0x00000000007fa8fc in posix_spawn (res=res@entry=0x7dd145a616a4, path=path@entry=0x80fc5e "/bin/sh", fa=fa@entry=0x7dd145a616d0,
attr=attr@entry=0x0, argv=argv@entry=0x7dd145a616b0, envp=<optimized out>) at src/process/posix_spawn.c:198
#2 0x00000000007eca1e in popen (cmd=cmd@entry=0x7dd145a61802 "ip neigh show", mode=mode@entry=0x80efa9 "r") at src/stdio/popen.c:41
#3 0x000000000049db1e in parse_neighbor_cache (db=0x7dd14c672ca8) at /app/src/database/network-table.c:1307
#4 0x00000000004904ec in DB_thread (val=<optimized out>) at /app/src/database/database-thread.c:199
#5 0x00000000007f26d0 in start (p=0x7dd145a61b00) at src/thread/pthread_create.c:207
#6 0x00000000007f3d5c in __clone () at src/thread/x86_64/clone.s:22
Backtrace stopped: frame did not save the PC
With set follow-fork-mode child
:
(gdb) continue
Continuing.
[Attaching after LWP 49 fork to child process 179]
[New inferior 2 (process 179)]
Thread 2.1 "pihole-FTL" received signal SIGPIPE, Broken pipe.
[Inferior 2 (process 179) exited normally]
Maybe it's a good idea to create a development build that includes debug symbols just to have a valid stack trace output?
I've also seen that the Network overview show this:
Here is "my" crash:
2024-11-11 11:15:42.529 CET [712/F49] DEBUG_ANY: Reopening Gravity database for this fork
2024-11-11 11:15:42.535 CET [713/F49] DEBUG_ANY: Reopening Gravity database for this fork
2024-11-11 11:15:42.536 CET [712/F49] INFO: !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
2024-11-11 11:15:42.537 CET [712/F49] INFO: ----------------------------> FTL crashed! <----------------------------
2024-11-11 11:15:42.537 CET [712/F49] INFO: !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
2024-11-11 11:15:42.537 CET [712/F49] INFO: Please report a bug at https://github.com/pi-hole/FTL/issues
2024-11-11 11:15:42.538 CET [712/F49] INFO: and include in your report already the following details:
2024-11-11 11:15:42.540 CET [712/F49] INFO: FTL has been running for 1016 seconds
2024-11-11 11:15:42.540 CET [712/F49] INFO: FTL branch: development
2024-11-11 11:15:42.541 CET [712/F49] INFO: FTL version: vDev-848367f
2024-11-11 11:15:42.541 CET [712/F49] INFO: FTL commit: 848367fa
2024-11-11 11:15:42.541 CET [712/F49] INFO: FTL date: 2024-11-08 19:13:07 +0100
2024-11-11 11:15:42.542 CET [712/F49] INFO: FTL user: started as pihole, ended as pihole
2024-11-11 11:15:42.542 CET [714/F49] DEBUG_ANY: Reopening Gravity database for this fork
2024-11-11 11:15:42.542 CET [712/F49] INFO: Compiled for linux/arm64/v8 (compiled on CI) using cc (Alpine 13.2.1_git20240309) 13.2.1 20240309
2024-11-11 11:15:42.543 CET [712/F49] INFO: Process details: MID: 49
2024-11-11 11:15:42.543 CET [712/F49] INFO: PID: 712
2024-11-11 11:15:42.543 CET [712/F49] INFO: TID: 712
2024-11-11 11:15:42.543 CET [712/F49] INFO: Name: pihole-FTL
2024-11-11 11:15:42.544 CET [712/F49] INFO: Received signal: Segmentation fault
2024-11-11 11:15:42.544 CET [712/F49] INFO: at address: 0x4c0cc200000765
2024-11-11 11:15:42.544 CET [712/F49] INFO: with code: SEGV_MAPERR (Address not mapped to object)
2024-11-11 11:15:42.545 CET [712/F49] INFO: !!! INFO: pihole-FTL has not been compiled with glibc/backtrace support, not generating one !!!
2024-11-11 11:15:42.545 CET [712/F49] INFO: ------ Listing content of directory /dev/shm ------
2024-11-11 11:15:42.546 CET [712/F49] INFO: File Mode User:Group Size Filename
2024-11-11 11:15:42.546 CET [712/F49] INFO: rwxrwxrwx root:root 360 .
2024-11-11 11:15:42.547 CET [712/F49] INFO: rwxr-xr-x root:root 320 ..
2024-11-11 11:15:42.547 CET [712/F49] INFO: rw------- pihole:pihole 88 FTL-lock
2024-11-11 11:15:42.547 CET [712/F49] INFO: rw------- pihole:pihole 328 FTL-counters
2024-11-11 11:15:42.548 CET [712/F49] INFO: rw------- pihole:pihole 136 FTL-settings
2024-11-11 11:15:42.548 CET [712/F49] INFO: rw------- pihole:pihole 164K FTL-strings
2024-11-11 11:15:42.549 CET [712/F49] INFO: rw------- pihole:pihole 115K FTL-domains
2024-11-11 11:15:42.549 CET [712/F49] INFO: rw------- pihole:pihole 348K FTL-clients
2024-11-11 11:15:42.550 CET [712/F49] INFO: rw------- pihole:pihole 29K FTL-upstreams
2024-11-11 11:15:42.550 CET [712/F49] INFO: rw------- pihole:pihole 15M FTL-queries
2024-11-11 11:15:42.550 CET [712/F49] INFO: rw------- pihole:pihole 8K FTL-overTime
2024-11-11 11:15:42.550 CET [715/F49] DEBUG_ANY: Reopening Gravity database for this fork
2024-11-11 11:15:42.551 CET [712/F49] INFO: rw------- pihole:pihole 102K FTL-dns-cache
2024-11-11 11:15:42.551 CET [712/F49] INFO: rw------- pihole:pihole 4K FTL-per-client-regex
2024-11-11 11:15:42.551 CET [712/F49] INFO: rw------- pihole:pihole 569K FTL-fifo-log
2024-11-11 11:15:42.552 CET [712/F49] INFO: rw------- pihole:pihole 4K FTL-clients-lookup
2024-11-11 11:15:42.552 CET [712/F49] INFO: rw------- pihole:pihole 29K FTL-domains-lookup
2024-11-11 11:15:42.552 CET [712/F49] INFO: rw------- pihole:pihole 20K FTL-dns-cache-lookup
2024-11-11 11:15:42.553 CET [712/F49] INFO: rw------- pihole:pihole 786K FTL-recycler
2024-11-11 11:15:42.553 CET [712/F49] INFO: ---------------------------------------------------
2024-11-11 11:15:42.553 CET [712/F49] INFO: Please also include some lines from above the !!!!!!!!! header.
2024-11-11 11:15:42.553 CET [712/F49] INFO: Thank you for helping us to improve our FTL engine!
2024-11-11 11:15:42.554 CET [712/F49] INFO: Asking parent pihole-FTL (PID 49) to shut down
2024-11-11 11:15:42.554 CET [712/F49] INFO: FTL fork terminated!
2024-11-11 11:15:42.555 CET [713/F49] ERROR: Error when obtaining outer SHM lock: Previous owner died
2024-11-11 11:15:42.555 CET [713/F49] ERROR: Error when obtaining inner SHM lock: Previous owner died
2024-11-11 11:15:42.556 CET [49/T152] INFO: Received: RT37 (37 -> 2)
2024-11-11 11:15:42.556 CET [716/F49] DEBUG_ANY: Reopening Gravity database for this fork
2024-11-11 11:15:42.557 CET [713/F49] INFO: !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
2024-11-11 11:15:42.558 CET [713/F49] INFO: ----------------------------> FTL crashed! <----------------------------
2024-11-11 11:15:42.558 CET [713/F49] INFO: !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
2024-11-11 11:15:42.558 CET [713/F49] INFO: Please report a bug at https://github.com/pi-hole/FTL/issues
2024-11-11 11:15:42.558 CET [713/F49] INFO: and include in your report already the following details:
2024-11-11 11:15:42.559 CET [713/F49] INFO: FTL has been running for 1016 seconds
2024-11-11 11:15:42.559 CET [713/F49] INFO: FTL branch: development
2024-11-11 11:15:42.559 CET [713/F49] INFO: FTL version: vDev-848367f
2024-11-11 11:15:42.559 CET [713/F49] INFO: FTL commit: 848367fa
2024-11-11 11:15:42.560 CET [713/F49] INFO: FTL date: 2024-11-08 19:13:07 +0100
2024-11-11 11:15:42.560 CET [713/F49] INFO: FTL user: started as pihole, ended as pihole
2024-11-11 11:15:42.561 CET [713/F49] INFO: Compiled for linux/arm64/v8 (compiled on CI) using cc (Alpine 13.2.1_git20240309) 13.2.1 20240309
2024-11-11 11:15:42.561 CET [713/F49] INFO: Process details: MID: 49
2024-11-11 11:15:42.561 CET [713/F49] INFO: PID: 713
2024-11-11 11:15:42.562 CET [713/F49] INFO: TID: 713
2024-11-11 11:15:42.562 CET [713/F49] INFO: Name: pihole-FTL
2024-11-11 11:15:42.556 CET [49/T152] DEBUG_ANY: Received SIGTERM
2024-11-11 11:15:42.562 CET [713/F49] INFO: Received signal: Segmentation fault
2024-11-11 11:15:42.563 CET [713/F49] INFO: at address: 0x4c0cc100000764
2024-11-11 11:15:42.563 CET [717/F49] DEBUG_ANY: Reopening Gravity database for this fork
2024-11-11 11:15:42.563 CET [49/T152] INFO: Asked to terminate by "/usr/bin/pihole-FTL no-daemon" (PID 49, user pihole UID 100)
2024-11-11 11:15:42.563 CET [713/F49] INFO: with code: SEGV_MAPERR (Address not mapped to object)
2024-11-11 11:15:42.563 CET [713/F49] INFO: !!! INFO: pihole-FTL has not been compiled with glibc/backtrace support, not generating one !!!
2024-11-11 11:15:42.563 CET [49/T152] DEBUG_ANY: Sending SIGUSR6 to dnsmasq to stop DNS service
2024-11-11 11:15:42.563 CET [713/F49] INFO: ------ Listing content of directory /dev/shm ------
2024-11-11 11:15:42.564 CET [713/F49] INFO: File Mode User:Group Size Filename
2024-11-11 11:15:42.564 CET [49M] DEBUG_ANY: dnsmasq received signal 41
2024-11-11 11:15:42.564 CET [713/F49] INFO: rwxrwxrwx root:root 360 .
2024-11-11 11:15:42.565 CET [713/F49] INFO: rwxr-xr-x root:root 320 ..
2024-11-11 11:15:42.565 CET [713/F49] INFO: rw------- pihole:pihole 88 FTL-lock
2024-11-11 11:15:42.565 CET [713/F49] INFO: rw------- pihole:pihole 88 FTL-lock
2024-11-11 11:15:42.566 CET [713/F49] INFO: rw------- pihole:pihole 328 FTL-counters
2024-11-11 11:15:42.566 CET [713/F49] INFO: rw------- pihole:pihole 136 FTL-settings
2024-11-11 11:15:42.566 CET [713/F49] INFO: rw------- pihole:pihole 164K FTL-strings
2024-11-11 11:15:42.567 CET [713/F49] INFO: rw------- pihole:pihole 115K FTL-domains
2024-11-11 11:15:42.567 CET [713/F49] INFO: rw------- pihole:pihole 348K FTL-clients
2024-11-11 11:15:42.568 CET [713/F49] INFO: rw------- pihole:pihole 29K FTL-upstreams
2024-11-11 11:15:42.568 CET [713/F49] INFO: rw------- pihole:pihole 15M FTL-queries
2024-11-11 11:15:42.569 CET [713/F49] INFO: rw------- pihole:pihole 8K FTL-overTime
2024-11-11 11:15:42.569 CET [713/F49] INFO: rw------- pihole:pihole 102K FTL-dns-cache
2024-11-11 11:15:42.570 CET [713/F49] INFO: rw------- pihole:pihole 4K FTL-per-client-regex
2024-11-11 11:15:42.570 CET [713/F49] INFO: rw------- pihole:pihole 569K FTL-fifo-log
2024-11-11 11:15:42.570 CET [713/F49] DEBUG_ANY: Closing gravity database
2024-11-11 11:15:42.572 CET [717/F49] DEBUG_ANY: Closing gravity database
2024-11-11 11:15:42.572 CET [714/F49] DEBUG_ANY: Closing gravity database
2024-11-11 11:15:42.578 CET [718/F49] DEBUG_ANY: Closing gravity database
2024-11-11 11:15:42.589 CET [715/F49] DEBUG_ANY: Closing gravity database
2024-11-11 11:15:42.596 CET [716/F49] DEBUG_ANY: Closing gravity database
2024-11-11 11:15:42.596 CET [49M] DEBUG_ANY: Shutting down... // exit code 1 // jmpret 0
2024-11-11 11:15:42.596 CET [719/F49] DEBUG_ANY: Reopening Gravity database for this fork
2024-11-11 11:15:42.606 CET [719/F49] DEBUG_ANY: Closing gravity database
2024-11-11 11:15:42.611 CET [49/T155] INFO: Terminating timer thread
2024-11-11 11:15:42.670 CET [49/T152] INFO: Terminating database thread
2024-11-11 11:15:42.726 CET [49/T154] INFO: Terminating resolver thread
2024-11-11 11:15:43.209 CET [49/T153] INFO: Terminating GC thread
2024-11-11 11:15:43.522 CET [49M] INFO: Finished final database update
2024-11-11 11:15:43.522 CET [49M] INFO: Waiting for threads to join
2024-11-11 11:15:43.522 CET [49M] INFO: All threads joined
2024-11-11 11:15:43.523 CET [49M] DEBUG_ANY: Closing gravity database
2024-11-11 11:15:43.526 CET [49M] INFO: Stored 1 API session in the database
2024-11-11 11:15:44.366 CET [49M] DEBUG_SHMEM: Pointer comparison at pos. 0: 0x7fad93c000 == 0x7fb750a000
2024-11-11 11:15:44.366 CET [49M] DEBUG_SHMEM: Pointer comparison at pos. 1: 0x7fb74a3000 == 0x7fb750a000
2024-11-11 11:15:44.366 CET [49M] DEBUG_SHMEM: Pointer comparison at pos. 2: 0x7faeed9000 == 0x7fb750a000
2024-11-11 11:15:44.366 CET [49M] DEBUG_SHMEM: Pointer comparison at pos. 3: 0x7fb749c000 == 0x7fb750a000
2024-11-11 11:15:44.367 CET [49M] DEBUG_SHMEM: Pointer comparison at pos. 4: 0x7faedd1000 == 0x7fb750a000
2024-11-11 11:15:44.367 CET [49M] DEBUG_SHMEM: Pointer comparison at pos. 5: 0x7fb73c1000 == 0x7fb750a000
2024-11-11 11:15:44.367 CET [49M] DEBUG_SHMEM: Pointer comparison at pos. 6: 0x7fb73c0000 == 0x7fb750a000
2024-11-11 11:15:44.367 CET [49M] DEBUG_SHMEM: Pointer comparison at pos. 7: 0x7fb7466000 == 0x7fb750a000
2024-11-11 11:15:44.367 CET [49M] DEBUG_SHMEM: Pointer comparison at pos. 8: 0x7fb748e000 == 0x7fb750a000
2024-11-11 11:15:44.367 CET [49M] DEBUG_SHMEM: Pointer comparison at pos. 9: 0x7fb72fe000 == 0x7fb750a000
2024-11-11 11:15:44.368 CET [49M] DEBUG_SHMEM: Pointer comparison at pos. 0: 0x7fad93c000 == 0x7faf276000
2024-11-11 11:15:44.368 CET [49M] DEBUG_SHMEM: Pointer comparison at pos. 1: 0x7fb74a3000 == 0x7faf276000
2024-11-11 11:15:44.368 CET [49M] DEBUG_SHMEM: Pointer comparison at pos. 2: 0x7faeed9000 == 0x7faf276000
2024-11-11 11:15:44.368 CET [49M] DEBUG_SHMEM: Pointer comparison at pos. 3: 0x7fb749c000 == 0x7faf276000
2024-11-11 11:15:44.368 CET [49M] DEBUG_SHMEM: Pointer comparison at pos. 4: 0x7faedd1000 == 0x7faf276000
2024-11-11 11:15:44.369 CET [49M] DEBUG_SHMEM: Pointer comparison at pos. 5: 0x7fb73c1000 == 0x7faf276000
2024-11-11 11:15:44.369 CET [49M] DEBUG_SHMEM: Pointer comparison at pos. 6: 0x7fb73c0000 == 0x7faf276000
2024-11-11 11:15:44.369 CET [49M] DEBUG_SHMEM: Pointer comparison at pos. 7: 0x7fb7466000 == 0x7faf276000
2024-11-11 11:15:44.369 CET [49M] DEBUG_SHMEM: Pointer comparison at pos. 8: 0x7fb748e000 == 0x7faf276000
2024-11-11 11:15:44.369 CET [49M] DEBUG_SHMEM: Pointer comparison at pos. 9: 0x7fb72fe000 == 0x7faf276000
2024-11-11 11:15:44.370 CET [49M] DEBUG_SHMEM: Pointer comparison at pos. 0: 0x7fad93c000 == 0x7fb7508000
2024-11-11 11:15:44.370 CET [49M] DEBUG_SHMEM: Pointer comparison at pos. 1: 0x7fb74a3000 == 0x7fb7508000
2024-11-11 11:15:44.370 CET [49M] DEBUG_SHMEM: Pointer comparison at pos. 2: 0x7faeed9000 == 0x7fb7508000
2024-11-11 11:15:44.370 CET [49M] DEBUG_SHMEM: Pointer comparison at pos. 3: 0x7fb749c000 == 0x7fb7508000
2024-11-11 11:15:44.370 CET [49M] DEBUG_SHMEM: Pointer comparison at pos. 4: 0x7faedd1000 == 0x7fb7508000
2024-11-11 11:15:44.371 CET [49M] DEBUG_SHMEM: Pointer comparison at pos. 5: 0x7fb73c1000 == 0x7fb7508000
2024-11-11 11:15:44.371 CET [49M] DEBUG_SHMEM: Pointer comparison at pos. 6: 0x7fb73c0000 == 0x7fb7508000
2024-11-11 11:15:44.371 CET [49M] DEBUG_SHMEM: Pointer comparison at pos. 7: 0x7fb7466000 == 0x7fb7508000
2024-11-11 11:15:44.371 CET [49M] DEBUG_SHMEM: Pointer comparison at pos. 8: 0x7fb748e000 == 0x7fb7508000
2024-11-11 11:15:44.371 CET [49M] DEBUG_SHMEM: Pointer comparison at pos. 9: 0x7fb72fe000 == 0x7fb7508000
2024-11-11 11:15:44.371 CET [49M] DEBUG_SHMEM: Pointer comparison at pos. 0: 0x7fad93c000 == 0x7faeed9000
2024-11-11 11:15:44.372 CET [49M] DEBUG_SHMEM: Pointer comparison at pos. 1: 0x7fb74a3000 == 0x7faeed9000
2024-11-11 11:15:44.372 CET [49M] DEBUG_SHMEM: Pointer comparison at pos. 2: 0x7faeed9000 == 0x7faeed9000
2024-11-11 11:15:44.372 CET [49M] DEBUG_SHMEM: Unmapping global pointer FTL-domains at 0x7faeed9000
2024-11-11 11:15:44.372 CET [49M] DEBUG_SHMEM: Pointer comparison at pos. 0: 0x7fad93c000 == 0x7fb74a3000
2024-11-11 11:15:44.372 CET [49M] DEBUG_SHMEM: Pointer comparison at pos. 1: 0x7fb74a3000 == 0x7fb74a3000
2024-11-11 11:15:44.373 CET [49M] DEBUG_SHMEM: Unmapping global pointer FTL-clients at 0x7fb74a3000
2024-11-11 11:15:44.373 CET [49M] DEBUG_SHMEM: Pointer comparison at pos. 0: 0x7fad93c000 == 0x7fad93c000
2024-11-11 11:15:44.373 CET [49M] DEBUG_SHMEM: Unmapping global pointer FTL-queries at 0x7fad93c000
2024-11-11 11:15:44.377 CET [49M] DEBUG_SHMEM: Pointer comparison at pos. 0: 0 == 0x7fb749c000
2024-11-11 11:15:44.378 CET [49M] DEBUG_SHMEM: Pointer comparison at pos. 1: 0 == 0x7fb749c000
2024-11-11 11:15:44.378 CET [49M] DEBUG_SHMEM: Pointer comparison at pos. 2: 0 == 0x7fb749c000
2024-11-11 11:15:44.378 CET [49M] DEBUG_SHMEM: Pointer comparison at pos. 3: 0x7fb749c000 == 0x7fb749c000
2024-11-11 11:15:44.378 CET [49M] DEBUG_SHMEM: Unmapping global pointer FTL-upstreams at 0x7fb749c000
2024-11-11 11:15:44.378 CET [49M] DEBUG_SHMEM: Pointer comparison at pos. 0: 0 == 0x7fb7452000
2024-11-11 11:15:44.379 CET [49M] DEBUG_SHMEM: Pointer comparison at pos. 1: 0 == 0x7fb7452000
2024-11-11 11:15:44.379 CET [49M] DEBUG_SHMEM: Pointer comparison at pos. 2: 0 == 0x7fb7452000
2024-11-11 11:15:44.379 CET [49M] DEBUG_SHMEM: Pointer comparison at pos. 3: 0 == 0x7fb7452000
2024-11-11 11:15:44.379 CET [49M] DEBUG_SHMEM: Pointer comparison at pos. 4: 0x7faedd1000 == 0x7fb7452000
2024-11-11 11:15:44.379 CET [49M] DEBUG_SHMEM: Pointer comparison at pos. 5: 0x7fb73c1000 == 0x7fb7452000
2024-11-11 11:15:44.380 CET [49M] DEBUG_SHMEM: Pointer comparison at pos. 6: 0x7fb73c0000 == 0x7fb7452000
2024-11-11 11:15:44.380 CET [49M] DEBUG_SHMEM: Pointer comparison at pos. 7: 0x7fb7466000 == 0x7fb7452000
2024-11-11 11:15:44.380 CET [49M] DEBUG_SHMEM: Pointer comparison at pos. 8: 0x7fb748e000 == 0x7fb7452000
2024-11-11 11:15:44.380 CET [49M] DEBUG_SHMEM: Pointer comparison at pos. 7: 0x7fb7466000 == 0x7fb7452000
2024-11-11 11:15:44.380 CET [49M] DEBUG_SHMEM: Pointer comparison at pos. 8: 0x7fb748e000 == 0x7fb7452000
2024-11-11 11:15:44.380 CET [49M] DEBUG_SHMEM: Pointer comparison at pos. 9: 0x7fb72fe000 == 0x7fb7452000
2024-11-11 11:15:44.380 CET [49M] DEBUG_SHMEM: Pointer comparison at pos. 0: 0 == 0x7fb7503000
2024-11-11 11:15:44.380 CET [49M] DEBUG_SHMEM: Pointer comparison at pos. 1: 0 == 0x7fb7503000
2024-11-11 11:15:44.381 CET [49M] DEBUG_SHMEM: Pointer comparison at pos. 2: 0 == 0x7fb7503000
2024-11-11 11:15:44.381 CET [49M] DEBUG_SHMEM: Pointer comparison at pos. 3: 0 == 0x7fb7503000
2024-11-11 11:15:44.381 CET [49M] DEBUG_SHMEM: Pointer comparison at pos. 4: 0x7faedd1000 == 0x7fb7503000
2024-11-11 11:15:44.381 CET [49M] DEBUG_SHMEM: Pointer comparison at pos. 5: 0x7fb73c1000 == 0x7fb7503000
2024-11-11 11:15:44.381 CET [49M] DEBUG_SHMEM: Pointer comparison at pos. 6: 0x7fb73c0000 == 0x7fb7503000
2024-11-11 11:15:44.382 CET [49M] DEBUG_SHMEM: Pointer comparison at pos. 7: 0x7fb7466000 == 0x7fb7503000
2024-11-11 11:15:44.382 CET [49M] DEBUG_SHMEM: Pointer comparison at pos. 8: 0x7fb748e000 == 0x7fb7503000
2024-11-11 11:15:44.382 CET [49M] DEBUG_SHMEM: Pointer comparison at pos. 9: 0x7fb72fe000 == 0x7fb7503000
2024-11-11 11:15:44.382 CET [49M] DEBUG_SHMEM: Pointer comparison at pos. 0: 0 == 0x7faedd1000
2024-11-11 11:15:44.382 CET [49M] DEBUG_SHMEM: Pointer comparison at pos. 1: 0 == 0x7faedd1000
2024-11-11 11:15:44.383 CET [49M] DEBUG_SHMEM: Pointer comparison at pos. 2: 0 == 0x7faedd1000
2024-11-11 11:15:44.383 CET [49M] DEBUG_SHMEM: Pointer comparison at pos. 3: 0 == 0x7faedd1000
2024-11-11 11:15:44.383 CET [49M] DEBUG_SHMEM: Pointer comparison at pos. 4: 0x7faedd1000 == 0x7faedd1000
2024-11-11 11:15:44.383 CET [49M] DEBUG_SHMEM: Unmapping global pointer FTL-dns-cache at 0x7faedd1000
2024-11-11 11:15:44.383 CET [49M] DEBUG_SHMEM: Pointer comparison at pos. 0: 0 == 0x7fb744c000
2024-11-11 11:15:44.384 CET [49M] DEBUG_SHMEM: Pointer comparison at pos. 1: 0 == 0x7fb744c000
2024-11-11 11:15:44.384 CET [49M] DEBUG_SHMEM: Pointer comparison at pos. 2: 0 == 0x7fb744c000
2024-11-11 11:15:44.384 CET [49M] DEBUG_SHMEM: Pointer comparison at pos. 3: 0 == 0x7fb744c000
2024-11-11 11:15:44.384 CET [49M] DEBUG_SHMEM: Pointer comparison at pos. 4: 0 == 0x7fb744c000
2024-11-11 11:15:44.384 CET [49M] DEBUG_SHMEM: Pointer comparison at pos. 5: 0x7fb73c1000 == 0x7fb744c000
2024-11-11 11:15:44.384 CET [49M] DEBUG_SHMEM: Pointer comparison at pos. 6: 0x7fb73c0000 == 0x7fb744c000
2024-11-11 11:15:44.385 CET [49M] DEBUG_SHMEM: Pointer comparison at pos. 7: 0x7fb7466000 == 0x7fb744c000
2024-11-11 11:15:44.385 CET [49M] DEBUG_SHMEM: Pointer comparison at pos. 8: 0x7fb748e000 == 0x7fb744c000
2024-11-11 11:15:44.385 CET [49M] DEBUG_SHMEM: Pointer comparison at pos. 9: 0x7fb72fe000 == 0x7fb744c000
2024-11-11 11:15:44.385 CET [49M] DEBUG_SHMEM: Pointer comparison at pos. 0: 0 == 0x7fb73c1000
2024-11-11 11:15:44.385 CET [49M] DEBUG_SHMEM: Pointer comparison at pos. 1: 0 == 0x7fb73c1000
2024-11-11 11:15:44.386 CET [49M] DEBUG_SHMEM: Pointer comparison at pos. 2: 0 == 0x7fb73c1000
2024-11-11 11:15:44.386 CET [49M] DEBUG_SHMEM: Pointer comparison at pos. 3: 0 == 0x7fb73c1000
2024-11-11 11:15:44.386 CET [49M] DEBUG_SHMEM: Pointer comparison at pos. 4: 0 == 0x7fb73c1000
2024-11-11 11:15:44.386 CET [49M] DEBUG_SHMEM: Pointer comparison at pos. 5: 0x7fb73c1000 == 0x7fb73c1000
2024-11-11 11:15:44.386 CET [49M] DEBUG_SHMEM: Unmapping global pointer /FTL-fifo-log at 0x7fb73c1000
2024-11-11 11:15:44.387 CET [49M] DEBUG_SHMEM: Pointer comparison at pos. 0: 0 == 0x7fb73c0000
2024-11-11 11:15:44.387 CET [49M] DEBUG_SHMEM: Pointer comparison at pos. 1: 0 == 0x7fb73c0000
2024-11-11 11:15:44.387 CET [49M] DEBUG_SHMEM: Pointer comparison at pos. 2: 0 == 0x7fb73c0000
2024-11-11 11:15:44.387 CET [49M] DEBUG_SHMEM: Pointer comparison at pos. 3: 0 == 0x7fb73c0000
2024-11-11 11:15:44.387 CET [49M] DEBUG_SHMEM: Pointer comparison at pos. 4: 0 == 0x7fb73c0000
2024-11-11 11:15:44.387 CET [49M] DEBUG_SHMEM: Pointer comparison at pos. 5: 0 == 0x7fb73c0000
2024-11-11 11:15:44.387 CET [49M] DEBUG_SHMEM: Pointer comparison at pos. 6: 0x7fb73c0000 == 0x7fb73c0000
2024-11-11 11:15:44.387 CET [49M] DEBUG_SHMEM: Unmapping global pointer FTL-clients-lookup at 0x7fb73c0000
2024-11-11 11:15:44.387 CET [49M] DEBUG_SHMEM: Pointer comparison at pos. 0: 0 == 0x7fb7466000
2024-11-11 11:15:44.387 CET [49M] DEBUG_SHMEM: Pointer comparison at pos. 1: 0 == 0x7fb7466000
2024-11-11 11:15:44.387 CET [49M] DEBUG_SHMEM: Pointer comparison at pos. 2: 0 == 0x7fb7466000
2024-11-11 11:15:44.387 CET [49M] DEBUG_SHMEM: Pointer comparison at pos. 3: 0 == 0x7fb7466000
2024-11-11 11:15:44.387 CET [49M] DEBUG_SHMEM: Pointer comparison at pos. 4: 0 == 0x7fb7466000
2024-11-11 11:15:44.387 CET [49M] DEBUG_SHMEM: Pointer comparison at pos. 5: 0 == 0x7fb7466000
2024-11-11 11:15:44.387 CET [49M] DEBUG_SHMEM: Pointer comparison at pos. 6: 0 == 0x7fb7466000
2024-11-11 11:15:44.387 CET [49M] DEBUG_SHMEM: Pointer comparison at pos. 7: 0x7fb7466000 == 0x7fb7466000
2024-11-11 11:15:44.387 CET [49M] DEBUG_SHMEM: Unmapping global pointer FTL-domains-lookup at 0x7fb7466000
2024-11-11 11:15:44.387 CET [49M] DEBUG_SHMEM: Pointer comparison at pos. 0: 0 == 0x7fb748e000
2024-11-11 11:15:44.387 CET [49M] DEBUG_SHMEM: Pointer comparison at pos. 1: 0 == 0x7fb748e000
2024-11-11 11:15:44.387 CET [49M] DEBUG_SHMEM: Pointer comparison at pos. 2: 0 == 0x7fb748e000
2024-11-11 11:15:44.387 CET [49M] DEBUG_SHMEM: Pointer comparison at pos. 3: 0 == 0x7fb748e000
2024-11-11 11:15:44.387 CET [49M] DEBUG_SHMEM: Pointer comparison at pos. 4: 0 == 0x7fb748e000
2024-11-11 11:15:44.387 CET [49M] DEBUG_SHMEM: Pointer comparison at pos. 5: 0 == 0x7fb748e000
2024-11-11 11:15:44.387 CET [49M] DEBUG_SHMEM: Pointer comparison at pos. 6: 0 == 0x7fb748e000
2024-11-11 11:15:44.388 CET [49M] DEBUG_SHMEM: Pointer comparison at pos. 7: 0 == 0x7fb748e000
2024-11-11 11:15:44.388 CET [49M] DEBUG_SHMEM: Pointer comparison at pos. 8: 0x7fb748e000 == 0x7fb748e000
2024-11-11 11:15:44.388 CET [49M] DEBUG_SHMEM: Unmapping global pointer FTL-dns-cache-lookup at 0x7fb748e000
2024-11-11 11:15:44.388 CET [49M] DEBUG_SHMEM: Pointer comparison at pos. 0: 0 == 0x7fb72fe000
2024-11-11 11:15:44.388 CET [49M] DEBUG_SHMEM: Pointer comparison at pos. 1: 0 == 0x7fb72fe000
2024-11-11 11:15:44.388 CET [49M] DEBUG_SHMEM: Pointer comparison at pos. 2: 0 == 0x7fb72fe000
2024-11-11 11:15:44.388 CET [49M] DEBUG_SHMEM: Pointer comparison at pos. 3: 0 == 0x7fb72fe000
2024-11-11 11:15:44.388 CET [49M] DEBUG_SHMEM: Pointer comparison at pos. 4: 0 == 0x7fb72fe000
2024-11-11 11:15:44.388 CET [49M] DEBUG_SHMEM: Pointer comparison at pos. 5: 0 == 0x7fb72fe000
2024-11-11 11:15:44.388 CET [49M] DEBUG_SHMEM: Pointer comparison at pos. 6: 0 == 0x7fb72fe000
2024-11-11 11:15:44.388 CET [49M] DEBUG_SHMEM: Pointer comparison at pos. 7: 0 == 0x7fb72fe000
2024-11-11 11:15:44.388 CET [49M] DEBUG_SHMEM: Pointer comparison at pos. 8: 0 == 0x7fb72fe000
2024-11-11 11:15:44.388 CET [49M] DEBUG_SHMEM: Pointer comparison at pos. 9: 0x7fb72fe000 == 0x7fb72fe000
2024-11-11 11:15:44.388 CET [49M] DEBUG_SHMEM: Unmapping global pointer FTL-recycler at 0x7fb72fe000
2024-11-11 11:15:44.388 CET [49M] INFO: ########## FTL terminated after 16m 57s (code 1)! ##########
[i] pihole-FTL exited with status 1
Maybe it's a good idea to create a development build that includes debug symbols just to have a valid stack trace output?
We already have the full set of debug symbols in the release builds, otherwise, you would not have seen the function names and related code lines in:
#0 __clone () at src/thread/x86_64/clone.s:18 #1 0x00000000007fa8fc in posix_spawn (res=res@entry=0x7dd145a616a4, path=path@entry=0x80fc5e "/bin/sh", fa=fa@entry=0x7dd145a616d0, attr=attr@entry=0x0, argv=argv@entry=0x7dd145a616b0, envp=<optimized out>) at src/process/posix_spawn.c:198 #2 0x00000000007eca1e in popen (cmd=cmd@entry=0x7dd145a61802 "ip neigh show", mode=mode@entry=0x80efa9 "r") at src/stdio/popen.c:41 #3 0x000000000049db1e in parse_neighbor_cache (db=0x7dd14c672ca8) at /app/src/database/network-table.c:1307 #4 0x00000000004904ec in DB_thread (val=<optimized out>) at /app/src/database/database-thread.c:199 #5 0x00000000007f26d0 in start (p=0x7dd145a61b00) at src/thread/pthread_create.c:207 #6 0x00000000007f3d5c in __clone () at src/thread/x86_64/clone.s:22
But this also shows that following forks/children won't work for us here. The location at where you entered backtrace
was where the child had been created. Seems you'd need to run continue
here for each forking. This is highly impractical...
@schuettecarsten Could you attach the debugger as well? This latest crash was in a fork, too, however, the first one (the very first post in this issue ticket) was in a normal thread of the main process (gdb
would have caught that).
Maybe here:
#0 sig_handler (sig=17) at /app/src/dnsmasq/dnsmasq.c:1312
#1 <signal handler called>
#2 __syscall_cp_c (nr=202, u=139057914187780, v=0, w=-2147483495, x=0, y=0, z=0) at ./arch/x86_64/syscall_arch.h:61
#3 0x00000000007fbad5 in __futex4_cp (to=<optimized out>, val=-2147483495, op=0, addr=0x7e78f1941004) at src/thread/__timedwait.c:24
#4 __timedwait_cp (addr=addr@entry=0x7e78f1941004, val=val@entry=-2147483495, clk=clk@entry=0, at=at@entry=0x0, priv=priv@entry=0)
at src/thread/__timedwait.c:52
#5 0x00000000007fbba2 in __timedwait (addr=addr@entry=0x7e78f1941004, val=-2147483495, clk=clk@entry=0, at=at@entry=0x0, priv=priv@entry=0)
at src/thread/__timedwait.c:68
#6 0x00000000007f33e4 in __pthread_mutex_timedlock (m=0x7e78f1941000, at=at@entry=0x0) at src/thread/pthread_mutex_timedlock.c:85
#7 0x00000000007f3226 in __pthread_mutex_lock (m=m@entry=0x7e78f1941000) at src/thread/pthread_mutex_lock.c:9
#8 0x00000000006c0b28 in FTLpthread_mutex_lock (__mutex=0x7e78f1941000, file=file@entry=0x800301 "/app/src/shmem.c",
func=func@entry=0x867c78 <__FUNCTION__.3> "_lock_shm", line=line@entry=351) at /app/src/syscalls/pthread_mutex_lock.c:23
#9 0x0000000000424586 in _lock_shm (func=func@entry=0x866560 <__FUNCTION__.2> "FTL_dnsmasq_log", line=line@entry=3687,
file=file@entry=0x7fec34 "/app/src/dnsmasq_interface.c") at /app/src/shmem.c:351
#10 0x000000000041367e in FTL_dnsmasq_log (payload=payload@entry=0x7ffd68bbe710 "query[A] pi.hole from 127.0.0.1", length=32)
at /app/src/dnsmasq_interface.c:3687
#11 0x00000000004d4056 in my_syslog (priority=priority@entry=6, format=format@entry=0x806830 "%s %s%s%s %s%s") at /app/src/dnsmasq/log.c:330
#12 0x00000000004b15eb in _log_query (flags=flags@entry=524424, name=0x7e78f17331c0 "pi.hole", addr=addr@entry=0x7ffd68bbed64,
arg=<optimized out>, type=<optimized out>, file=file@entry=0x8072a6 "/app/src/dnsmasq/forward.c", line=1820)
at /app/src/dnsmasq/cache.c:2374
#13 0x00000000004cba6d in _log_query_mysockaddr (line=1820, type=<optimized out>, arg=<optimized out>, addr=0x7ffd68bbed60,
name=<optimized out>, flags=524296) at /app/src/dnsmasq/forward.c:139
#14 _log_query_mysockaddr (line=1820, type=<optimized out>, arg=<optimized out>, addr=0x7ffd68bbed60, name=<optimized out>, flags=524296)
at /app/src/dnsmasq/forward.c:133
#15 receive_query (listen=listen@entry=0x7e78f1948e00, now=now@entry=1731357142) at /app/src/dnsmasq/forward.c:1820
#16 0x00000000004b9063 in check_dns_listeners (now=now@entry=1731357142) at /app/src/dnsmasq/dnsmasq.c:1895
#17 0x00000000004bb278 in main_dnsmasq (argc=<optimized out>, argv=<optimized out>) at /app/src/dnsmasq/dnsmasq.c:1278
#18 0x00000000004020d7 in main (argc=<optimized out>, argv=0x7ffd68bbf158) at /app/src/main.c:123
or
#0 sig_handler (sig=17) at /app/src/dnsmasq/dnsmasq.c:1312
warning: 1312 /app/src/dnsmasq/dnsmasq.c: No such file or directory
(gdb) backtrace
#0 sig_handler (sig=17) at /app/src/dnsmasq/dnsmasq.c:1312
#1 <signal handler called>
#2 __cp_end () at src/thread/x86_64/syscall_cp.s:29
#3 0x00000000007f19e3 in __syscall_cp_c (nr=0, u=<optimized out>, v=<optimized out>, w=<optimized out>, x=<optimized out>,
y=<optimized out>, z=0) at src/thread/pthread_cancel.c:33
#4 0x00000000007f52b3 in read (fd=fd@entry=47, buf=buf@entry=0x7ffd68bbeed3, count=count@entry=1) at src/unistd/read.c:6
#5 0x0000000000506e8e in read (__n=<optimized out>, __s=<optimized out>, __f=47) at /usr/include/fortify/unistd.h:118
#6 read_write (fd=47, packet=packet@entry=0x7ffd68bbeed3 "", size=size@entry=1, rw=rw@entry=1) at /app/src/dnsmasq/util.c:780
#7 0x00000000004b8ff0 in check_dns_listeners (now=now@entry=1731357142) at /app/src/dnsmasq/dnsmasq.c:2012
#8 0x00000000004bb278 in main_dnsmasq (argc=<optimized out>, argv=<optimized out>) at /app/src/dnsmasq/dnsmasq.c:1278
#9 0x00000000004020d7 in main (argc=<optimized out>, argv=0x7ffd68bbf158) at /app/src/main.c:123
or
#0 sig_handler (sig=17) at /app/src/dnsmasq/dnsmasq.c:1312
#1 <signal handler called>
#2 __restore_sigs (set=set@entry=0x7ffd68bbedf0) at ./arch/x86_64/syscall_arch.h:40
#3 0x00000000007e97bf in fork () at src/process/fork.c:86
#4 0x00000000004b8fab in check_dns_listeners (now=now@entry=1731357268) at /app/src/dnsmasq/dnsmasq.c:1989
#5 0x00000000004bb278 in main_dnsmasq (argc=<optimized out>, argv=<optimized out>) at /app/src/dnsmasq/dnsmasq.c:1278
#6 0x00000000004020d7 in main (argc=<optimized out>, argv=0x7ffd68bbf158) at /app/src/main.c:123
I'm not experienced with gdb. Thank you for your patience. If these backtraces are not helpful, I apologize
Signal 17 is SIGCHLD
and gdb
shouldn't have stopped for this one. I have not seen this signal myself being used inside pihole-FTL
but I guess it is another one to be added with nostop
to your /root/.gdbinit
file ... I'm still convinced we will finally get there!
Yes, unfortunately, the crash does not appear right away. Maybe I'll catch it.
In the meantime, I suggest looking at parsing the network clients. The above screenshot goes on for many clients:
My Pihole is set up such that it only has 2 clients: 2 reverse proxies for DoH and DoT.
The Query Log shows the client only N. (Previously, it was the IP.)
@bermeitinger-b I don't think there is anything wrong with the network table TBH, my current assumption is that somehow handling of the strings got broken and your database got populated with garbage data. This would likely explain the crash you are seeing, too. But I am still not sure how/where it happens. I have meanwhile manually inspected the entire string processing code paths twice and did not find anything odd.
Please also enable DEBUG_SHMEM
logging by running sudo pihole-FTL --config debug.shmem true
- it will provide us with greater detail to FTL.log
how strings are stored/reused. This should help us finding the issue.
Here an excerpt after flushing the network table:
pihole | 2024-11-12 09:54:47.521 CET [49/T155] DEBUG_SHMEM: Not adding "" to buffer (unchanged)
pihole | 2024-11-12 09:54:47.521 CET [49/T155] DEBUG_SHMEM: Adding "localhost.lan" (len 14) to buffer in resolveAndAddHostname() (src/resolve.c:797), next_str_pos is 54983
pihole | 2024-11-12 09:54:47.522 CET [49/T155] DEBUG_SHMEM: Not adding "" to buffer (unchanged)
pihole | 2024-11-12 09:54:47.522 CET [49/T155] WARNING: Invalid IPv4 address when trying to resolve hostname: ^P
pihole | 2024-11-12 09:54:47.523 CET [49/T155] DEBUG_SHMEM: Not adding "" to buffer (unchanged)
pihole | 2024-11-12 09:54:47.523 CET [49/T155] WARNING: Invalid IPv4 address when trying to resolve hostname:
pihole | 2024-11-12 09:54:47.523 CET [49/T155] DEBUG_SHMEM: Not adding "" to buffer (unchanged)
pihole | 2024-11-12 09:54:47.524 CET [49/T155] WARNING: Invalid IPv4 address when trying to resolve hostname: . 2.16.0.1
pihole | 2024-11-12 09:54:47.524 CET [49/T155] DEBUG_SHMEM: Not adding "" to buffer (unchanged)
pihole | 2024-11-12 09:54:47.524 CET [49/T155] DEBUG_SHMEM: Adding "pi.hole" (len 8) to buffer in resolveAndAddHostname() (src/resolve.c:797), next_str_pos is 54997
pihole | 2024-11-12 09:54:47.526 CET [49/T155] DEBUG_SHMEM: Reusing existing string "localhost.lan" at 54983 in resolveAndAddHostname() (src/resolve.c:797)
pihole | 2024-11-12 09:54:47.526 CET [49/T155] WARNING: Invalid IPv4 address when trying to resolve hostname: sunny-portal.de
pihole | 2024-11-12 09:54:47.526 CET [49/T155] DEBUG_SHMEM: Not adding "" to buffer (unchanged)
pihole | 2024-11-12 09:54:47.526 CET [49/T155] WARNING: Invalid IPv4 address when trying to resolve hostname: rtal.de
pihole | 2024-11-12 09:54:47.527 CET [49/T155] DEBUG_SHMEM: Not adding "" to buffer (unchanged)
pihole | 2024-11-12 09:54:47.527 CET [49/T155] WARNING: Invalid IPv4 address when trying to resolve hostname: �
pihole | 2024-11-12 09:54:47.527 CET [49/T155] DEBUG_SHMEM: Not adding "" to buffer (unchanged)
pihole | 2024-11-12 09:54:47.527 CET [49/T155] WARNING: Invalid IPv4 address when trying to resolve hostname: � ~^F
pihole | 2024-11-12 09:54:47.528 CET [49/T155] DEBUG_SHMEM: Not adding "" to buffer (unchanged)
pihole | 2024-11-12 09:54:47.528 CET [49/T155] WARNING: Invalid IPv4 address when trying to resolve hostname: ^M� ^?^F
pihole | 2024-11-12 09:54:47.528 CET [49/T155] DEBUG_SHMEM: Not adding "" to buffer (unchanged)
pihole | 2024-11-12 09:54:47.528 CET [49/T155] WARNING: Invalid IPv4 address when trying to resolve hostname: ^M�
pihole | 2024-11-12 09:54:47.529 CET [49/T155] DEBUG_SHMEM: Not adding "" to buffer (unchanged)
pihole | 2024-11-12 09:54:47.529 CET [49/T155] WARNING: Invalid IPv4 address when trying to resolve hostname: ^M� �^F
pihole | 2024-11-12 09:54:47.529 CET [49/T155] DEBUG_SHMEM: Not adding "" to buffer (unchanged)
pihole | 2024-11-12 09:54:47.529 CET [49/T155] WARNING: Invalid IPv4 address when trying to resolve hostname: ^Q� �^F
pihole | 2024-11-12 09:54:47.530 CET [49/T155] DEBUG_SHMEM: Not adding "" to buffer (unchanged)
pihole | 2024-11-12 09:54:47.530 CET [49/T155] WARNING: Invalid IPv4 address when trying to resolve hostname: ^Q� �^F
pihole | 2024-11-12 09:54:47.530 CET [49/T155] DEBUG_SHMEM: Not adding "" to buffer (unchanged)
pihole | 2024-11-12 09:54:47.530 CET [49/T155] WARNING: Invalid IPv4 address when trying to resolve hostname: ^Q�
pihole | 2024-11-12 09:54:47.531 CET [49/T155] DEBUG_SHMEM: Not adding "" to buffer (unchanged)
pihole | 2024-11-12 09:54:47.531 CET [49/T155] WARNING: Invalid IPv4 address when trying to resolve hostname: ^W� �^F
pihole | 2024-11-12 09:54:47.531 CET [49/T155] DEBUG_SHMEM: Not adding "" to buffer (unchanged)
pihole | 2024-11-12 09:54:47.532 CET [49/T155] WARNING: Invalid IPv4 address when trying to resolve hostname: ^W�
pihole | 2024-11-12 09:54:47.532 CET [49/T155] DEBUG_SHMEM: Not adding "" to buffer (unchanged)
pihole | 2024-11-12 09:54:47.532 CET [49/T155] WARNING: Invalid IPv4 address when trying to resolve hostname: ^Q� �^F
pihole | 2024-11-12 09:54:47.533 CET [49/T155] DEBUG_SHMEM: Not adding "" to buffer (unchanged)
pihole | 2024-11-12 09:54:47.533 CET [49/T155] WARNING: Invalid IPv4 address when trying to resolve hostname: ^Z� �^F
pihole | 2024-11-12 09:54:47.533 CET [49/T155] DEBUG_SHMEM: Not adding "" to buffer (unchanged)
pihole | 2024-11-12 09:54:47.533 CET [49/T155] WARNING: Invalid IPv4 address when trying to resolve hostname: ^Z� �^F
pihole | 2024-11-12 09:54:47.534 CET [49/T155] DEBUG_SHMEM: Not adding "" to buffer (unchanged)
pihole | 2024-11-12 09:54:47.534 CET [49/T155] WARNING: Invalid IPv4 address when trying to resolve hostname: ^W� �^F
pihole | 2024-11-12 09:54:47.534 CET [49/T155] DEBUG_SHMEM: Not adding "" to buffer (unchanged)
pihole | 2024-11-12 09:54:47.534 CET [49/T155] WARNING: Invalid IPv4 address when trying to resolve hostname:
pihole | 2024-11-12 09:54:47.535 CET [49/T155] DEBUG_SHMEM: Not adding "" to buffer (unchanged)
Yes, this brings us a lot closer. Please restart pihole-FTL
if you haven't done already after running the --config
command above. This ensures all string processing will be printed right from the start of the pihole-FTL
process. If you then again encounter such lines, e.g.
pihole | 2024-11-12 09:54:47.524 CET [49/T155] WARNING: Invalid IPv4 address when trying to resolve hostname: . 2.16.0.1
you can try to search for the first occurrence of this strange string like
grep ". 2.16.0.1" /var/log/pihole/FTL.log
It'd also be helpful if you could run
sudo kill -s RTMIN+7 $(cat /run/pihole-FTL.pid)
and see if any errors regarding hash tables are reported in /var/log/pihole/FTL.log
(can take a few seconds after running the command above)
Thanks.
I've restarted and it prints a lot of Adding "<domain>" (len x) to buffer
and seemingly for all domains I've visited (?).
My shell highlights the E
in this line:
2024-11-12 10:16:47.165 CET [49M] DEBUG_SHMEM: Adding "E" (len 2) to buffer in _findUpstreamID() (src/datastructure.c:174), nex
t_str_pos is 4687
Typically, this is an indicator that it's not a literal E. file FTL.log
also shows data
instead of ASCII text
.
After many of these lines, it looks the output changes:
2024-11-12 10:16:47.356 CET [49M] DEBUG_SHMEM: Adding "personalsafety-pa.googleapis.com" (len 33) to buffer in _findDomainID()
(src/datastructure.c:270), next_str_pos is 45448
2024-11-12 10:16:47.356 CET [49M] DEBUG_SHMEM: Reusing existing string "access-point.cloudmessaging.edge.microsoft.com" at 1451
7 in _findDomainID() (src/datastructure.c:270)
2024-11-12 10:16:47.356 CET [49M] DEBUG_SHMEM: Reusing existing string "sunny-portal.de" at 363 in _findClientID() (src/datastr
ucture.c:356)
2024-11-12 10:16:47.356 CET [49M] DEBUG_SHMEM: LCM(4096, 1) == 4096 >= 0
2024-11-12 10:16:47.357 CET [49M] DEBUG_SHMEM: Reusing existing string "rtal.de" at 371 in _findClientID() (src/datastructure.c
:356)
2024-11-12 10:16:47.357 CET [49M] DEBUG_SHMEM: LCM(4096, 1) == 4096 >= 0
2024-11-12 10:16:47.357 CET [49M] DEBUG_SHMEM: Adding "F" (len 2) to buffer in _findDomainID() (src/datastructure.c:270), next_
str_pos is 45481
2024-11-12 10:16:47.358 CET [49M] DEBUG_SHMEM: Adding " " (len 4) to buffer in _findClientID() (src/datastructure.c:356
), next_str_pos is 45483
2024-11-12 10:16:47.358 CET [49M] DEBUG_SHMEM: LCM(4096, 1) == 4096 >= 0
2024-11-12 10:16:47.358 CET [49M] DEBUG_SHMEM: Adding " ~F" (len 6) to buffer in _findClientID() (src/datastructure.c:3
56), next_str_pos is 45487
2024-11-12 10:16:47.358 CET [49M] DEBUG_SHMEM: LCM(4096, 1) == 4096 >= 0
2024-11-12 10:16:47.359 CET [49M] DEBUG_SHMEM: Adding "M ?F" (len 6) to buffer in _findClientID() (src/datastructure.c:3
56), next_str_pos is 45493
2024-11-12 10:16:47.359 CET [49M] DEBUG_SHMEM: LCM(4096, 1) == 4096 >= 0
2024-11-12 10:16:47.359 CET [49M] DEBUG_SHMEM: Adding "M " (len 4) to buffer in _findClientID() (src/datastructure.c:356
), next_str_pos is 45499
2024-11-12 10:16:47.360 CET [49M] DEBUG_SHMEM: LCM(4096, 1) == 4096 >= 0
2024-11-12 10:16:47.360 CET [49M] DEBUG_SHMEM: Adding "M F" (len 6) to buffer in _findClientID() (src/datastructure.c:3
56), next_str_pos is 45503
2024-11-12 10:16:47.360 CET [49M] DEBUG_SHMEM: LCM(4096, 1) == 4096 >= 0
2024-11-12 10:16:47.361 CET [49M] DEBUG_SHMEM: Adding "Q F" (len 6) to buffer in _findClientID() (src/datastructure.c:3
56), next_str_pos is 45509
About the hash collisions:
2024-11-12 10:30:27.814 CET [49M] INFO: Received: RT42 (42 -> 7)
2024-11-12 10:30:28.150 CET [49/T113] INFO: Searching for hash collisions in clients lookup table
2024-11-12 10:30:28.150 CET [49/T113] INFO: Found 0 hash collisions and 0 sorting errors in clients lookup table (scanned 100 e
lements)
2024-11-12 10:30:28.150 CET [49/T113] INFO: Searching for hash collisions in domains lookup table
2024-11-12 10:30:28.150 CET [49/T113] INFO: Found 0 hash collisions and 0 sorting errors in domains lookup table (scanned 2221
elements)
2024-11-12 10:30:28.150 CET [49/T113] INFO: Searching for hash collisions in DNS cache lookup table
2024-11-12 10:30:28.150 CET [49/T113] INFO: Found 0 hash collisions and 0 sorting errors in DNS cache lookup table (scanned 137
0 elements)
2024-11-12 10:30:28.150 CET [49/T113] INFO: Recycle list fullness:
2024-11-12 10:30:28.150 CET [49/T113] INFO: Clients: 0/65535 (0.00%)
2024-11-12 10:30:28.150 CET [49/T113] INFO: Domains: 0/65535 (0.00%)
2024-11-12 10:30:28.150 CET [49/T113] INFO: DNS Cache: 0/65535 (0.00%)
Seems okay, however, I'm not sure why it scans 100 clients. There should only be 3 (localhost, DoH, DoT).
Looking at the logs, we are in a bit of a chicken-and-egg-problem as the database already has some broken stuff which then "contaminates" FTL's strings on the history import during startup. I'd suggest to disable database importing for the moment,
sudo pihole-FTL --config database.DBimport false
and then restart again. You may also want to clean out the log file first to get rid of the binary stuff that is in there right now.
edit The 100 clients come about because FTL detects clients based on their (string) IP address and when the IP address is garbage, a new client is added for each new garbage string.
I've deleted the old database and started new. It generated the following log:
pihole | 2024-11-12 14:19:15.726 CET [49M] DEBUG_ANY: dnsmasq received signal 17
pihole | 2024-11-12 14:19:15.934 CET [49M] DEBUG_SHMEM: Reusing existing string "eth0" at 13244 in _FTL_new_query() (src/dnsmasq_interface.c:906)
pihole | 2024-11-12 14:19:15.935 CET [49M] DEBUG_SHMEM: Reusing existing string "0" at 12 in get_client_groupids() (src/database/gravity-db.c:730)
pihole | 2024-11-12 14:19:15.935 CET [49M] DEBUG_SHMEM: LCM(4096, 1) == 4096 >= 24
pihole | 2024-11-12 14:19:15.936 CET [49M] DEBUG_SHMEM: Reusing existing string "0" at 12 in get_client_groupids() (src/database/gravity-db.c:730)
pihole | 2024-11-12 14:19:15.936 CET [49M] DEBUG_SHMEM: LCM(4096, 1) == 4096 >= 24
pihole | 2024-11-12 14:19:15.937 CET [49M] DEBUG_SHMEM: Reusing existing string "0" at 12 in get_client_groupids() (src/database/gravity-db.c:730)
pihole | 2024-11-12 14:19:15.938 CET [49M] DEBUG_SHMEM: LCM(4096, 1) == 4096 >= 24
pihole | 2024-11-12 14:19:16.183 CET [49M] DEBUG_SHMEM: Adding "star-mini.c10r.facebook.com" (len 28) to buffer in _findDomainID() (src/datastructure.c:270), next_str_pos is 27632
pihole | 2024-11-12 14:19:16.966 CET [49M] DEBUG_SHMEM: Reusing existing string "www.youtube.com" at 3122 in _findDomainID() (src/datastructure.c:270)
pihole | 2024-11-12 14:19:17.039 CET [49/T85] INFO: !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
pihole | 2024-11-12 14:19:17.039 CET [49/T85] INFO: ----------------------------> FTL crashed! <----------------------------
pihole | 2024-11-12 14:19:17.039 CET [49/T85] INFO: !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
The rest is the same as in the first logs above. As above, it did not restart automatically.
The log file is not binary anymore, so the network table is looking correctly.
Did you have the debugger gdb
attached? How does the backtrace
look like on this crash?
Maybe:
Thread 1 "pihole-FTL" received signal SIGSEGV, Segmentation fault.
0x0000000000559fed in pcache1FetchNoMutex (createFlag=2, iKey=1, p=0x75dbc36e5cb8)
at /app/src/database/sqlite3.c:56250
warning: 56250 /app/src/database/sqlite3.c: No such file or directory
(gdb) bt
#0 0x0000000000559fed in pcache1FetchNoMutex (createFlag=2, iKey=1, p=0x75dbc36e5cb8)
at /app/src/database/sqlite3.c:56250
#1 pcache1Fetch (p=0x75dbc36e5cb8, iKey=1, createFlag=2) at /app/src/database/sqlite3.c:56307
#2 0x00000000005d873d in sqlite3PcacheFetch (createFlag=3, pgno=1, pCache=<optimized out>)
at /app/src/database/sqlite3.c:54725
#3 getPageNormal (pPager=0x75dbc3658448, pgno=1, ppPage=0x7ffdbb3923d0, flags=0)
at /app/src/database/sqlite3.c:62716
#4 0x00000000005e20cf in sqlite3PagerGet (flags=0, ppPage=0x7ffdbb3923d0, pgno=1,
pPager=<optimized out>) at /app/src/database/sqlite3.c:62908
#5 btreeGetPage (flags=0, ppPage=<synthetic pointer>, pgno=1, pBt=0x75dbc3397a08)
at /app/src/database/sqlite3.c:72974
#6 lockBtree (pBt=<optimized out>) at /app/src/database/sqlite3.c:73918
#7 btreeBeginTrans (p=0x75dbc36f8c98, wrflag=0, pSchemaVersion=0x7ffdbb392530)
at /app/src/database/sqlite3.c:74313
#8 0x000000000063ff0c in sqlite3VdbeExec (p=p@entry=0x75dbc363da58)
at /app/src/database/sqlite3.c:97438
#9 0x0000000000646ae0 in sqlite3Step (p=0x75dbc363da58) at /app/src/database/sqlite3.c:91357
#10 sqlite3_step (pStmt=pStmt@entry=0x75dbc363da58) at /app/src/database/sqlite3.c:91418
#11 0x00000000004910c7 in domain_in_list (
domain=domain@entry=0x75dbc364acf0 "1.0.16.172.in-addr.arpa", stmt=0x75dbc363da58,
listname=listname@entry=0x805baf "whitelist", domain_id=domain_id@entry=0x75dbc2942fe8)
at /app/src/database/gravity-db.c:1180
#12 0x0000000000493d28 in in_allowlist (
domain=domain@entry=0x75dbc364acf0 "1.0.16.172.in-addr.arpa",
dns_cache=dns_cache@entry=0x75dbc2942fd8, client=client@entry=0x75dbc2d0eab0)
at /app/src/database/gravity-db.c:1282
#13 0x000000000040bfcc in FTL_check_blocking (queryID=queryID@entry=10847,
domainID=domainID@entry=39, clientID=clientID@entry=1022) at /app/src/dnsmasq_interface.c:1618
#14 0x000000000040f17f in _FTL_new_query (flags=flags@entry=524296, name=<optimized out>,
addr=addr@entry=0x7ffdbb392970, arg=<optimized out>, qtype=<optimized out>, id=33202,
proto=UDP, file=0x8072a6 "/app/src/dnsmasq/forward.c", line=1822)
at /app/src/dnsmasq_interface.c:960
#15 0x00000000004cb49f in receive_query (listen=listen@entry=0x75dbc36fce00,
now=now@entry=1731420001) at /app/src/dnsmasq/forward.c:1822
#16 0x00000000004b9063 in check_dns_listeners (now=now@entry=1731420001)
and right after:
#0 releaseMemArray (p=0x75dbc36e4920, N=1) at /app/src/database/sqlite3.c:87133
#1 0x000000000056614e in releaseMemArray (N=<optimized out>, p=<optimized out>)
at /app/src/database/sqlite3.c:87130
#2 sqlite3VdbeClearObject (p=0x75dbc3643268, db=0x75dbc365a018)
at /app/src/database/sqlite3.c:88686
#3 sqlite3VdbeDelete (p=0x75dbc3643268) at /app/src/database/sqlite3.c:88723
#4 0x00000000005ea36d in sqlite3_finalize (pStmt=0x75dbc3643268)
at /app/src/database/sqlite3.c:90625
#5 0x0000000000490a32 in gravityDB_finalize_client_statements (client=0x75dbc28302a8)
at /app/src/database/gravity-db.c:925
#6 gravityDB_close () at /app/src/database/gravity-db.c:964
#7 0x000000000049365e in gravityDB_close () at /app/src/database/gravity-db.c:956
#8 0x0000000000408451 in cleanup (ret=ret@entry=1) at /app/src/daemon.c:374
#9 0x0000000000425ac8 in signal_handler (sig=<optimized out>, si=0x7ffdbb391eb0,
unused=<optimized out>) at /app/src/signals.c:266
#10 <signal handler called>
#11 0x0000000000559fed in pcache1FetchNoMutex (createFlag=2, iKey=1, p=0x75dbc36e5cb8)
at /app/src/database/sqlite3.c:56250
#12 pcache1Fetch (p=0x75dbc36e5cb8, iKey=1, createFlag=2) at /app/src/database/sqlite3.c:56307
#13 0x00000000005d873d in sqlite3PcacheFetch (createFlag=3, pgno=1, pCache=<optimized out>)
at /app/src/database/sqlite3.c:54725
#14 getPageNormal (pPager=0x75dbc3658448, pgno=1, ppPage=0x7ffdbb3923d0, flags=0)
at /app/src/database/sqlite3.c:62716
#15 0x00000000005e20cf in sqlite3PagerGet (flags=0, ppPage=0x7ffdbb3923d0, pgno=1,
pPager=<optimized out>) at /app/src/database/sqlite3.c:62908
#16 btreeGetPage (flags=0, ppPage=<synthetic pointer>, pgno=1, pBt=0x75dbc3397a08)
at /app/src/database/sqlite3.c:72974
#17 lockBtree (pBt=<optimized out>) at /app/src/database/sqlite3.c:73918
#18 btreeBeginTrans (p=0x75dbc36f8c98, wrflag=0, pSchemaVersion=0x7ffdbb392530)
at /app/src/database/sqlite3.c:74313
#19 0x000000000063ff0c in sqlite3VdbeExec (p=p@entry=0x75dbc363da58)
at /app/src/database/sqlite3.c:97438
#20 0x0000000000646ae0 in sqlite3Step (p=0x75dbc363da58) at /app/src/database/sqlite3.c:91357
#21 sqlite3_step (pStmt=pStmt@entry=0x75dbc363da58) at /app/src/database/sqlite3.c:91418
#22 0x00000000004910c7 in domain_in_list (
domain=domain@entry=0x75dbc364acf0 "1.0.16.172.in-addr.arpa", stmt=0x75dbc363da58,
listname=listname@entry=0x805baf "whitelist", domain_id=domain_id@entry=0x75dbc2942fe8)
at /app/src/database/gravity-db.c:1180
#23 0x0000000000493d28 in in_allowlist (
domain=domain@entry=0x75dbc364acf0 "1.0.16.172.in-addr.arpa",
dns_cache=dns_cache@entry=0x75dbc2942fd8, client=client@entry=0x75dbc2d0eab0)
at /app/src/database/gravity-db.c:1282
#24 0x000000000040bfcc in FTL_check_blocking (queryID=queryID@entry=10847,
domainID=domainID@entry=39, clientID=clientID@entry=1022) at /app/src/dnsmasq_interface.c:1618
#25 0x000000000040f17f in _FTL_new_query (flags=flags@entry=524296, name=<optimized out>,
addr=addr@entry=0x7ffdbb392970, arg=<optimized out>, qtype=<optimized out>, id=33202,
proto=UDP, file=0x8072a6 "/app/src/dnsmasq/forward.c", line=1822)
at /app/src/dnsmasq_interface.c:960
#26 0x00000000004cb49f in receive_query (listen=listen@entry=0x75dbc36fce00,
now=now@entry=1731420001) at /app/src/dnsmasq/forward.c:1822
#27 0x00000000004b9063 in check_dns_listeners (now=now@entry=1731420001)
at /app/src/dnsmasq/dnsmasq.c:1895
#28 0x00000000004bb278 in main_dnsmasq (argc=<optimized out>, argv=<optimized out>)
at /app/src/dnsmasq/dnsmasq.c:1278
#29 0x00000000004020d7 in main (argc=<optimized out>, argv=0x7ffdbb392d68) at /app/src/main.c:123
Then, the whole container dies.
Edit: After some time, the glyphs reappear in the network table. I don't have a backtrace for those but will check the log.
Edit: After some time, the glyphs reappear in the network table. I don't have a backtrace for those but will check the log.
Did anything come out here? I've been looking at the backtraces yesterday but it was too late to reply. Now, looking on them, they do not make a lot more sense TBH. Both crashes happened deeply down inside the embedded sqlite3
source code
but we don't know if this is cause or merely symptom of the string issues causing sqlite3
to trip over binary code in strings. Usually, it shouldn't but reporting sqlite3
bugs is a nightmare. Unless one can provide a clean and very small minimal-working-example (MWE) of the bug, they will not even try to investigate and lean back on "it is the application that is embedded sqlite3
that is causing issues".
We haven't been able to provide such a MWE for the one bug we reported back then but - fortunately - someone else did for some bug apparently caused by the same cause this allowed us to get the fix. Having said all this, I don't want to end in something like "sorry, we cannot do anything about it" but I furthermore don't see us really be able to do SQLite3 debugging - even worse if I cannot reproduce and "play" myself.
So let's look back at pihole-FTL
and try to find a fix there. Did you already see the glyphs in the log file yesterday before the crash? If so, then this is definitely the next thing we need to fix. If you have the log file, I could provide you with an upload link via e-mail as I assume it'll be too large, even after gzip
ing it. Just send me a quick mail to <my username here> @ pi-hole.net
and I'll reply with a suitable link. Otherwise, if you want to share it via something, I'd be ready for downloading it from there, too.
I fully understand and I'm thankful for your deep investigation. I can reproduce the crash by deleting pihole-FTL.db
and recreate the container.
This is the log after a few hours of running with at least 5 crashes. For personally identifying reasons, I've dropped a few rows and anonymized some private domains. They were correct domains.
Sorry for the delay, but I had a hard time thinking about what we could do. I did now prepare a new special branch debug/no_fork
which detects the presence of an external debugger and prevents TCP forking in this case. Please check out this branch and continue testing. gdb
should now be able to follow all the crashes
Okay, thank you very much. How can I run this inside docker?
@bermeitinger-b Unfortunately, for docker it means a few extra steps. You'll have to build your own container like
git clone https://github.com/pi-hole/docker-pi-hole
cd docker-pi-hole
git checkout development # Future readers: This step is only needed until Pi-hole v6 is released
./build.sh -f debug/no_fork
and then using the generated image pihole:local
instead of pihole/pihole:development
(or :nightly
) should be enough.
Thanks, this worked perfectly and I'm running the debugging fork now. The log confirms this:
2024-11-15 14:00:15.669 CET [49M] INFO: ########## FTL started on 6c68c417781b! ##########
2024-11-15 14:00:15.669 CET [49M] INFO: FTL branch: debug/no_fork
2024-11-15 14:00:15.669 CET [49M] INFO: FTL version: vDev-0d7e5ec
2024-11-15 14:00:15.669 CET [49M] INFO: FTL commit: 0d7e5ec7
2024-11-15 14:00:15.669 CET [49M] INFO: FTL date: 2024-11-13 13:36:50 +0100
2024-11-15 14:00:15.669 CET [49M] INFO: FTL user: pihole
...
2024-11-15 14:00:54.900 CET [49/T137] INFO: Debugger attached (192: gdb), entering dnsmasq debug mode
...
In gdb, it shows many lines with
Thread 1 "pihole-FTL" received signal SIGPIPE, Broken pipe.
This is a SIGSEV that I've observed (right after I've restarted the upstream resolver, might be unrelated)
#0 get_meta (p=p@entry=0x7e8c21c3cff0 "\020\021") at src/malloc/mallocng/meta.h:141
#1 0x00000000007dfe62 in __libc_free (p=0x7e8c21c3cff0) at src/malloc/mallocng/free.c:105
#2 0x00000000005507c8 in pcache1FreePage (p=<optimized out>) at /app/src/database/sqlite3.c:55720
#3 pcache1TruncateUnsafe (pCache=pCache@entry=0x7e8c21d3a618, iLimit=iLimit@entry=1) at /app/src/database/sqlite3.c:55918
#4 0x0000000000550814 in pcache1Truncate (iLimit=1, p=0x7e8c21d3a618) at /app/src/database/sqlite3.c:56399
#5 pcache1Truncate (p=0x7e8c21d3a618, iLimit=<optimized out>) at /app/src/database/sqlite3.c:56395
#6 0x00000000005d716e in sqlite3PcacheClear (pCache=<optimized out>) at /app/src/database/sqlite3.c:55037
#7 pager_reset (pPager=0x7e8c21d3a7d8) at /app/src/database/sqlite3.c:58967
#8 pagerBeginReadTransaction (pPager=<optimized out>) at /app/src/database/sqlite3.c:60442
#9 sqlite3PagerSharedLock (pPager=<optimized out>) at /app/src/database/sqlite3.c:62608
#10 0x00000000005e26c9 in lockBtree (pBt=<optimized out>) at /app/src/database/sqlite3.c:73916
#11 btreeBeginTrans (p=0x7e8c22077b58, wrflag=1, pSchemaVersion=0x7e8c205467f0) at /app/src/database/sqlite3.c:74313
#12 0x000000000064052c in sqlite3VdbeExec (p=p@entry=0x7e8c21d4cf28) at /app/src/database/sqlite3.c:97438
#13 0x0000000000647100 in sqlite3Step (p=0x7e8c21d4cf28) at /app/src/database/sqlite3.c:91357
#14 sqlite3_step (pStmt=0x7e8c21d4cf28) at /app/src/database/sqlite3.c:91418
#15 0x00000000004a2489 in export_queries_to_disk (final=final@entry=false) at /app/src/database/query-table.c:625
#16 0x000000000049096a in DB_thread (val=<optimized out>) at /app/src/database/database-thread.c:143
#17 0x00000000007f2d10 in start (p=0x7e8c20546b00) at src/thread/pthread_create.c:207
#18 0x00000000007f439c in __clone () at src/thread/x86_64/clone.s:22
Okay, so this confirms a crash again deep inside sqlite3
. A crash in get_meta
following a free
is very likely a double-free corruption. We'll need to bring another tool into the game to properly trace this: valgrind
We have prepared instructions for valgrind
here: https://deploy-preview-338--pihole-docs.netlify.app/ftldns/valgrind/
Still sorry it is such a slow process...
I don't think you're in a position to apologize ;) Thank you for your efforts.
I've started valgrind. Within docker, that's a bit complicated because stopping pihole-FTL stops the container. I've adjusted the docker-compose.yaml
:
pihole:
image: pihole:local
...
entrypoint: /bin/bash
tty: true
Then, I could start valgrind.
The error log is here: https://0x0.st/XkkG.txt
It indicates something at /sbin/ip address show
.
pihole-FTL also dies with many Thread 4 received signal SIG33, Real-time event 33
I hope this helps.
No, this is actually fine. There will be a lot of output, the log (or gdb
if you have re-attached it as described in the valgrind
guide) will tell you once FTL crashes. The "errors" shown here are okay, valgrind
is very verbose in in general logs many false-positives. What remains is still some good detective work once we have all the logs and the crash.
I hope this leads somewhere. I can't really reproduce the state. This time, it crashes after 58 minutes and it's detached with error 33. I've removed the nostops for 33 (and 37).
2024-11-15 19:50:12.529 CET [3151/F35] INFO: Received signal: Segmentation fault
2024-11-15 19:50:12.530 CET [3151/F35] INFO: at address: 0x489d3fc
2024-11-15 19:50:12.532 CET [3151/F35] INFO: with code: SEGV_MAPERR (Address not mapped to object)
And here the valgrind.log: https://0x0.st/Xk7C.log
There is one binary line in line 55 which I've removed. (If you need it, it's here https://0x0.st/Xk7F.bin)
Thanks. The important bit is
==3151== Invalid read of size 4
==3151== at 0x41B27C: binsearch (lookup-table.c:94)
==3151== by 0x41B27C: lookup_find_id (lookup-table.c:337)
==3151== by 0x4095AB: _findCacheID (datastructure.c:475)
==3151== by 0x40EDD1: _FTL_new_query (dnsmasq_interface.c:880)
==3151== by 0x4C8E79: tcp_request (forward.c:2405)
==3151== by 0x4B980C: check_dns_listeners (dnsmasq.c:2078)
==3151== by 0x4BB897: main_dnsmasq (dnsmasq.c:1278)
==3151== by 0x4020D6: main (main.c:123)
==3151== Address 0x489d3fc is not stack'd, malloc'd or (recently) free'd
I did just push an update to the same debug branch (commit 26d1dd3), please re-build the container and 🤞
Thanks, the new commit resulted in the probably same message:
459 │ ==2320== Invalid read of size 4
460 │ ==2320== at 0x41B27C: binsearch (lookup-table.c:94)
461 │ ==2320== by 0x41B27C: lookup_find_id (lookup-table.c:337)
462 │ ==2320== by 0x4095AB: _findCacheID (datastructure.c:475)
463 │ ==2320== by 0x40EDD1: _FTL_new_query (dnsmasq_interface.c:880)
464 │ ==2320== by 0x4C8E99: tcp_request (forward.c:2405)
465 │ ==2320== by 0x4B982C: check_dns_listeners (dnsmasq.c:2078)
466 │ ==2320== by 0x4BB8B7: main_dnsmasq (dnsmasq.c:1278)
467 │ ==2320== by 0x4020D6: main (main.c:123)
468 │ ==2320== Address 0x58767fc is not stack'd, malloc'd or (recently) free'd
Full log: https://0x0.st/XdHX.log
This time, I couldn't backtrace
, maybe I disabled message?
I'm sure it's the correct version. From FTL.log:
2024-11-16 15:14:05.973 CET [33M] INFO: FTL branch: debug/no_fork
2024-11-16 15:14:05.976 CET [33M] INFO: FTL version: vDev-26d1dd3
2024-11-16 15:14:05.977 CET [33M] INFO: FTL commit: 26d1dd33
2024-11-16 15:14:05.978 CET [33M] INFO: FTL date: 2024-11-16 08:08:12 +0100
Too bad, so back to the drawing board. I did just push another update to the branch adding further debug output. It'd be good if you could update and try again. Once it crashes, we'd need the address the algorithm tried to access, like what you quoted from valgrind
above and the last log line concerning what FTL tried to access, like
Inserting/Removing/Looking for element with ID ... and hash ... into/from/in ... lookup table (..., ... // .../.../...))
Please ensure you still have debug.shmem = true
to see the additional output.
If you can attach both valgrind
and gdb
+ we are lucky and the crash does not happen in a fork, it'd be interesting to get the following output in addition to backtrace
:
print hash
print base
print mid
print size
print *try
print **try
edit The expected commit would be b4f194b3
This time, gdb spammed with "Thread 4 event 33" and the CPU usage was so busy with gdb and valgrind that I had to restart the server. So, I could not run these gdb commands.
These are the last 7 minutes in FTL.log: https://0x0.st/Xdbb.txt
And this is the valgrind.log: https://0x0.st/XdbZ.log
Thanks. Unfortunately, this is another crash in another location without the "Invalid read of size 4" section so we don't get the address it tried to look at. I pushed another commit (59c22db3) that adds some extra out-of-bounds checking. Lets hope I'm not going down the wrong track here...
edit 1 It'll be interesting to see:
ERROR: ...
lines in pihole-FTL.log
?edit 2 I added another error message, updated the expected commit hash above
Thanks.
I think it didn't crash where it could be traced. I've removed the SIG33 nostop
but it's a different position:
Thread 6 received signal SIG33, Real-time event 33.
[Switching to Thread 132]
__cp_end () at src/thread/x86_64/syscall_cp.s:29
warning: 29 src/thread/x86_64/syscall_cp.s: No such file or directory
(gdb) backtrace
#0 __cp_end () at src/thread/x86_64/syscall_cp.s:29
#1 0x00000000007f22e3 in __syscall_cp_c (nr=23, u=<optimized out>, v=<optimized out>, w=<optimized out>, x=<optimized out>,
y=<optimized out>, z=0) at src/thread/pthread_cancel.c:33
#2 0x00000000007eb000 in select (n=n@entry=0, rfds=rfds@entry=0x0, wfds=wfds@entry=0x0, efds=efds@entry=0x0, tv=tv@entry=0x6122a00)
at src/select/select.c:39
#3 0x00000000006c1709 in FTLselect (nfds=nfds@entry=0, readfds=readfds@entry=0x0, writefds=writefds@entry=0x0,
exceptfds=exceptfds@entry=0x0, timeout=timeout@entry=0x6122a00, file=file@entry=0x8016f3 "/app/src/timers.c",
func=0x8691c0 <__FUNCTION__.0> "sleepms", line=65) at /app/src/syscalls/select.c:25
#4 0x000000000042723a in sleepms (milliseconds=<optimized out>) at /app/src/timers.c:65
#5 0x0000000000426db1 in thread_sleepms (thread=thread@entry=GC, milliseconds=<optimized out>) at /app/src/signals.c:500
#6 0x0000000000418477 in GC_thread (val=<optimized out>) at /app/src/gc.c:659
#7 0x00000000007f2fd0 in start (p=0x6122b00) at src/thread/pthread_create.c:207
#8 0x00000000007f465c in __clone () at src/thread/x86_64/clone.s:22
Backtrace stopped: frame did not save the PC
(gdb) print hash 10:01:39 [34/47899]
$1 = (const nettle_hash *) 0xb09f60 <nettle_sha256>
(gdb) print base
No symbol "base" in current context.
(gdb) print mid
No symbol "mid" in current context.
(gdb) print size
$2 = 0
(gdb) print *try
No symbol "try" in current context.
(gdb) print **try
No symbol "try" in current context.
(gdb) continue
Continuing.
Thread 6 received signal SIG33, Real-time event 33.
__cp_end () at src/thread/x86_64/syscall_cp.s:29
29 in src/thread/x86_64/syscall_cp.s
(gdb) backtrac
#0 __cp_end () at src/thread/x86_64/syscall_cp.s:29
#1 0x00000000007f22e3 in __syscall_cp_c (nr=23, u=<optimized out>, v=<optimized out>, w=<optimized out>, x=<optimized out>,
y=<optimized out>, z=0) at src/thread/pthread_cancel.c:33
#2 0x00000000007eb000 in select (n=n@entry=0, rfds=rfds@entry=0x0, wfds=wfds@entry=0x0, efds=efds@entry=0x0, tv=tv@entry=0x6122a00)
at src/select/select.c:39
#3 0x00000000006c1709 in FTLselect (nfds=nfds@entry=0, readfds=readfds@entry=0x0, writefds=writefds@entry=0x0,
exceptfds=exceptfds@entry=0x0, timeout=timeout@entry=0x6122a00, file=file@entry=0x8016f3 "/app/src/timers.c",
func=0x8691c0 <__FUNCTION__.0> "sleepms", line=65) at /app/src/syscalls/select.c:25
#4 0x000000000042723a in sleepms (milliseconds=<optimized out>) at /app/src/timers.c:65
#5 0x0000000000426db1 in thread_sleepms (thread=thread@entry=GC, milliseconds=<optimized out>) at /app/src/signals.c:500
#6 0x0000000000418477 in GC_thread (val=<optimized out>) at /app/src/gc.c:659
#7 0x00000000007f2fd0 in start (p=0x6122b00) at src/thread/pthread_create.c:207
#8 0x00000000007f465c in __clone () at src/thread/x86_64/clone.s:22
Backtrace stopped: frame did not save the PC
(gdb) continue
Continuing.
(Then, it's dead.)
valgrind.log: https://0x0.st/Xdbd.log
These are the last lines in FTL.log: https://0x0.st/Xdbn.txt
Yeah, it shouldn't stop at SIG33. When valgrind
is attached, it uses SIG33 for internal messaging. Even a brief interruption like in your case causes valgrind
to have the hiccups. Interestingly, I do not see some SIG33s but certainly not a flood of them even when I have valgrind
and gdb
running - but I don't think this is related.
Okay, I think I have something new. With red, very dry eyes, I have been staring at your previous log and found the following interesting lines:
2024-11-13 13:47:34.520 CET [957/F52] DEBUG_SHMEM: Resizing "FTL-domains-lookup" from 4096 to (1024 * 8) == 8192 (2.2MB used, 67.1MB total, FTL uses 2.2MB)
2024-11-13 13:47:34.520 CET [957/F52] DEBUG_SHMEM: SHMEM pointer updated: 0x7bb3e5d08000 -> 0x7bb3e56e4000 (4096 8192)
[...]
2024-11-13 13:47:34.521 CET [957/F52] INFO: Received signal: Segmentation fault
2024-11-13 13:47:34.521 CET [957/F52] INFO: at address: 0x7bb3e5d080ec
2024-11-13 13:47:34.521 CET [957/F52] INFO: with code: SEGV_MAPERR (Address not mapped to object)
Okay ... so it tried to access memory at 0x7bb3e5d080ec
which belongs to the old memory location 0x7bb3e5d08000
even though it logged one millisecond above that the shared memory pointer moved to 0x7bb3e56e4000
.
I am actually quite confident that adding the missing remapping instructions (commit 341464fb) fixes the bug we are seeing here. It also explains why it mostly happens in forks and why I haven't seen it myself before (my network is simply to quiet/not enough queries).
I take my hat off to you while bowing with respect. You did it. 👍
With this branch, it's running since 10 hours without a crash. Thank you very much.
I am actually quite confident that adding the missing remapping instructions (commit 341464f) fixes the bug we are seeing here. It also explains why it mostly happens in forks and why I haven't seen it myself before (my network is simply to quiet/not enough queries).
I just looked at your code. Is this code thread-safe? Or is thread safety no issue because of a global "lock" somewhere?
Thank you for confirming the fix!
I have now split this into two branches: The one you are currently on (debug/no_fork
) got stripped down to the extra code ensuring no forking is happening whenever gdb
is attached to ease future debugging. Furthermore, I have created a new branch called fix/binsearch_shm
that has the fix for shared memory remapping.
I did that to ease review as these are basically two independent changes which should result in two independent pull requests (#2120 and #2121 ). I also removed a few of the commits adding more debug output which we didn't need in the end. No need to keep "dead" code around.
@schuettecarsten
I just looked at your code. Is this code thread-safe? Or is thread safety no issue because of a global "lock" somewhere?
We take care of only accessing these global memory sections when a global (shared-memory) lock is helt (lock_shm()
and unlock_shm()
). This to not only ensure thread-safety but even fork-safety as FTL does multi-processing both in threads and processes. We try to keep the overhead as small as possible by only holding the lock in sections that really need it.
So far, we have never identified the lock to be an issue as most systems are idle for so long that the lock is basically free to be taken right away for the vast majority of time.
The bug should be fixed in development
by now. For docker
users, the fix is in :nightly
. I will close this as completed. Thank you all for your input and helping us making Pi-hole better for us all. Much appreciated!
FTL crashed for unknown reasons several times today. Here is a full log;