pi-hole / FTL

The Pi-hole FTL engine
https://pi-hole.net
Other
1.38k stars 196 forks source link

FTL crashed [Fixed in 5.19.2] #1478

Closed somera closed 1 year ago

somera commented 2 years ago

Versions

Platform

Actual behavior / bug

[2022-11-15 20:12:23.581 941217/F910787] !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
[2022-11-15 20:12:23.582 941217/F910787] ---------------------------->  FTL crashed!  <----------------------------
[2022-11-15 20:12:23.582 941217/F910787] !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
[2022-11-15 20:12:23.582 941217/F910787] Please report a bug at https://github.com/pi-hole/FTL/issues
[2022-11-15 20:12:23.582 941217/F910787] and include in your report already the following details:
[2022-11-15 20:12:23.582 941217/F910787] FTL has been running for 13051 seconds
[2022-11-15 20:12:23.582 941217/F910787] FTL branch: master
[2022-11-15 20:12:23.582 941217/F910787] FTL version: v5.19.1
[2022-11-15 20:12:23.582 941217/F910787] FTL commit: b48b3e1f
[2022-11-15 20:12:23.582 941217/F910787] FTL date: 2022-11-14 22:01:50 +0000
[2022-11-15 20:12:23.582 941217/F910787] FTL user: started as pihole, ended as pihole
[2022-11-15 20:12:23.582 941217/F910787] Compiled for x86_64 (compiled on CI) using gcc (Debian 8.3.0-6) 8.3.0
[2022-11-15 20:12:23.582 941217/F910787] Process details: MID: 910787
[2022-11-15 20:12:23.582 941217/F910787]                  PID: 941217
[2022-11-15 20:12:23.582 941217/F910787]                  TID: 941217
[2022-11-15 20:12:23.582 941217/F910787]                  Name: pihole-FTL
[2022-11-15 20:12:23.582 941217/F910787] Received signal: Segmentation fault
[2022-11-15 20:12:23.582 941217/F910787]      at address: 0x7fe346994008
[2022-11-15 20:12:23.582 941217/F910787]      with code:  SEGV_MAPERR (Address not mapped to object)
[2022-11-15 20:12:23.583 941217/F910787] Backtrace:
[2022-11-15 20:12:23.583 941217/F910787] B[0000]: /usr/bin/pihole-FTL(generate_backtrace+0x41) [0x56291048e571]
[2022-11-15 20:12:23.623 941217/F910787] L[0000]: /__w/FTL/FTL/src/signals.c:99
[2022-11-15 20:12:23.625 941217/F910787] B[0001]: /usr/bin/pihole-FTL(+0x5bacd) [0x56291048eacd]
[2022-11-15 20:12:23.638 941217/F910787] L[0001]: /__w/FTL/FTL/src/signals.c:242
[2022-11-15 20:12:23.640 941217/F910787] B[0002]: /lib/x86_64-linux-gnu/libpthread.so.0(+0x14420) [0x7fe34681c420]
[2022-11-15 20:12:23.640 941217/F910787] B[0003]: /usr/bin/pihole-FTL(_lock_shm+0x71) [0x56291048d5d1]
[2022-11-15 20:12:23.652 941217/F910787] L[0003]: /__w/FTL/FTL/src/shmem.c:420 (discriminator 1)
[2022-11-15 20:12:23.653 941217/F910787] B[0004]: /usr/bin/pihole-FTL(_FTL_new_query+0x246) [0x56291047fde6]
[2022-11-15 20:12:23.669 941217/F910787] L[0004]: /__w/FTL/FTL/src/dnsmasq_interface.c:628
[2022-11-15 20:12:23.670 941217/F910787] B[0005]: /usr/bin/pihole-FTL(tcp_request+0x840) [0x5629104c3610]
[2022-11-15 20:12:23.685 941217/F910787] L[0005]: /__w/FTL/FTL/src/dnsmasq/forward.c:2319
[2022-11-15 20:12:23.686 941217/F910787] B[0006]: /usr/bin/pihole-FTL(+0x81ffa) [0x5629104b4ffa]
[2022-11-15 20:12:23.702 941217/F910787] L[0006]: /__w/FTL/FTL/src/dnsmasq/dnsmasq.c:2056
[2022-11-15 20:12:23.704 941217/F910787] B[0007]: /usr/bin/pihole-FTL(main_dnsmasq+0x145a) [0x5629104b707a]
[2022-11-15 20:12:23.719 941217/F910787] L[0007]: /__w/FTL/FTL/src/dnsmasq/dnsmasq.c:1278
[2022-11-15 20:12:23.720 941217/F910787] B[0008]: /usr/bin/pihole-FTL(main+0x13c) [0x56291047374c]
[2022-11-15 20:12:23.731 941217/F910787] L[0008]: /__w/FTL/FTL/src/main.c:118
[2022-11-15 20:12:23.732 941217/F910787] B[0009]: /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3) [0x7fe34663a083]
[2022-11-15 20:12:23.732 941217/F910787] B[0010]: /usr/bin/pihole-FTL(_start+0x2a) [0x562910473e1a]
[2022-11-15 20:12:23.753 941217/F910787] L[0010]: ??:?
[2022-11-15 20:12:23.756 941217/F910787] ------ Listing content of directory /dev/shm ------
[2022-11-15 20:12:23.756 941217/F910787] File Mode User:Group      Size  Filename
[2022-11-15 20:12:23.756 941217/F910787] rwxrwxrwx root:root       300  .
[2022-11-15 20:12:23.756 941217/F910787] rwxr-xr-x root:root         4K  ..
[2022-11-15 20:12:23.756 941217/F910787] rw------- pihole:pihole     4K  FTL-per-client-regex
[2022-11-15 20:12:23.757 941217/F910787] rw------- pihole:pihole   127K  FTL-dns-cache
[2022-11-15 20:12:23.757 941217/F910787] rw------- pihole:pihole     8K  FTL-overTime
[2022-11-15 20:12:23.757 941217/F910787] rw------- pihole:pihole     7M  FTL-queries
[2022-11-15 20:12:23.757 941217/F910787] rw------- pihole:pihole   315K  FTL-upstreams
[2022-11-15 20:12:23.757 941217/F910787] rw------- pihole:pihole    86K  FTL-clients
[2022-11-15 20:12:23.757 941217/F910787] rw------- pihole:pihole   123K  FTL-domains
[2022-11-15 20:12:23.757 941217/F910787] rw------- pihole:pihole   205K  FTL-strings
[2022-11-15 20:12:23.757 941217/F910787] rw------- pihole:pihole    16  FTL-settings
[2022-11-15 20:12:23.757 941217/F910787] rw------- pihole:pihole   248  FTL-counters
[2022-11-15 20:12:23.757 941217/F910787] rw------- pihole:pihole    88  FTL-lock
[2022-11-15 20:12:23.757 941217/F910787] rw------- postgres:postgres   7K  PostgreSQL.268161415
[2022-11-15 20:12:23.757 941217/F910787] rwx------ root:root        80  multipath
[2022-11-15 20:12:23.757 941217/F910787] ---------------------------------------------------
[2022-11-15 20:12:23.757 941217/F910787] Please also include some lines from above the !!!!!!!!! header.
[2022-11-15 20:12:23.757 941217/F910787] Thank you for helping us to improve our FTL engine!
[2022-11-15 20:12:23.757 941217/F910787] Asking parent pihole-FTL (PID 910787) to shut down
[2022-11-15 20:12:23.758 941217/F910787] FTL fork terminated!
[2022-11-15 20:12:23.758 941218/F910787] Error when obtaining outer SHM lock: Owner died
DL6ER commented 2 years ago

SEGV_MAPERR (Address not mapped to object)

suggests an issue with the shared memory. Can you "reliably" trigger the crash? If so, please add

DEBUG_SHMEM=true

to the file /etc/pihole/pihole-FTL.conf (create if it does not exist) and run

pihole restartdns

Next time the crash happens, please include again the crash report from /var/log/pihole/FTL.conf with some extra lines of context above the !!!!!! header. This may help us getting a better picture of where/why exactly the issue happens.

somera commented 2 years ago

@DL6ER thx.

No, I can't trigger the crash. After the upgrade to FTL 5.19 I had pro blem too. But I ignored it. Today in the morning I updated to 5.19.1 and some hours late if crashed. This is my first problem (since 2019)with pi-hole.

somera commented 2 years ago

@DL6ER it crashed after ~60 minutes. FTL.log

somera commented 2 years ago

It crashed again after I opened ~80 tabs in browser

[2022-11-15 23:29:00.933 47193/F40426] !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
[2022-11-15 23:29:00.933 47193/F40426] ---------------------------->  FTL crashed!  <----------------------------
[2022-11-15 23:29:00.933 47193/F40426] !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
[2022-11-15 23:29:00.933 47193/F40426] Please report a bug at https://github.com/pi-hole/FTL/issues
[2022-11-15 23:29:00.933 47193/F40426] and include in your report already the following details:
[2022-11-15 23:29:00.933 47193/F40426] FTL has been running for 2405 seconds
somera commented 2 years ago

After upgrade pi-hole the blocked queris rised to ~50% image Before the last upgrade it was ~25%

rdwebdesign commented 2 years ago

After upgrade pi-hole the blocked queris rised to ~50% ... Before the last upgrade it was ~25%

This only means one or more of these items could be happening:

somera commented 2 years ago

After upgrade pi-hole the blocked queris rised to ~50% ... Before the last upgrade it was ~25%

This only means one or more of these items could be happening:

  • your lists could be changed (different lists or the same lists, but with different domains);
  • the websites you visit could be using different sources for their ads and this is reflected on the count;
  • your network use changed (different websites visited, different number of visits, different number of clients, different apps, etc.);

ok.

My list cout hasn't changed. I think the number of domains on some adlists grows.

grandaddy commented 2 years ago

Having the same issue - DNS crashing every 30-60 minutes with recent update:

[2022-11-15 18:00:07.942 16986M] SQLite3 message: database is locked in "SELECT hwaddr FROM network WHERE id = (SELECT network_id FROM network_addresses WHERE ip = ? GROUP BY ip HAVING max(lastSeen));" (5) [2022-11-15 18:00:07.942 16986M] getMACfromIP("127.0.0.1") - SQL error prepare: database is locked [2022-11-15 18:00:08.826 16986/T17002] Notice: Database size is 1580.11 MB, deleted 377 rows [2022-11-15 18:00:09.012 21348/F16986] !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! [2022-11-15 18:00:09.013 21348/F16986] ----------------------------> FTL crashed! <---------------------------- [2022-11-15 18:00:09.013 21348/F16986] !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! [2022-11-15 18:00:09.013 21348/F16986] Please report a bug at https://github.com/pi-hole/FTL/issues [2022-11-15 18:00:09.013 21348/F16986] and include in your report already the following details: [2022-11-15 18:00:09.013 21348/F16986] FTL has been running for 2172 seconds [2022-11-15 18:00:09.013 21348/F16986] FTL branch: master [2022-11-15 18:00:09.013 21348/F16986] FTL version: v5.19.1 [2022-11-15 18:00:09.013 21348/F16986] FTL commit: b48b3e1f [2022-11-15 18:00:09.013 21348/F16986] FTL date: 2022-11-14 22:01:50 +0000 [2022-11-15 18:00:09.014 21348/F16986] FTL user: started as pihole, ended as pihole [2022-11-15 18:00:09.014 21348/F16986] Compiled for armv7hf (compiled on CI) using arm-linux-gnueabihf-gcc (Debian 8.3.0-2) 8.3.0 [2022-11-15 18:00:09.014 21348/F16986] Process details: MID: 16986 [2022-11-15 18:00:09.014 21348/F16986] PID: 21348 [2022-11-15 18:00:09.014 21348/F16986] TID: 21348 [2022-11-15 18:00:09.014 21348/F16986] Name: pihole-FTL [2022-11-15 18:00:09.014 21348/F16986] Received signal: Segmentation fault [2022-11-15 18:00:09.014 21348/F16986] at address: 0xb6fca008 [2022-11-15 18:00:09.014 21348/F16986] with code: SEGV_MAPERR (Address not mapped to object) [2022-11-15 18:00:09.015 21348/F16986] Backtrace: [2022-11-15 18:00:09.016 21348/F16986] B[0000]: /usr/bin/pihole-FTL(generate_backtrace+0x2d) [0x4cb46e]

[2022-11-15 18:00:09.100 21348/F16986] B[0001]: /usr/bin/pihole-FTL(+0x3b7fe) [0x4cb7fe]

[2022-11-15 18:00:09.154 21348/F16986] B[0002]: /lib/arm-linux-gnueabihf/libc.so.6(__default_rt_sa_restorer+0) [0xb6d6e120] [2022-11-15 18:00:09.154 21348/F16986] B[0003]: /usr/bin/pihole-FTL(_lock_shm+0x55) [0x4ca906]

[2022-11-15 18:00:09.209 21348/F16986] B[0004]: /usr/bin/pihole-FTL(_FTL_new_query+0x1ed) [0x4c034e]

[2022-11-15 18:00:09.266 21348/F16986] B[0005]: /usr/bin/pihole-FTL(tcp_request+0x62b) [0x4f0770]

[2022-11-15 18:00:09.326 21348/F16986] B[0006]: /usr/bin/pihole-FTL(+0x57072) [0x4e7072]

[2022-11-15 18:00:09.384 21348/F16986] B[0007]: /usr/bin/pihole-FTL(main_dnsmasq+0xe5b) [0x4e86fc]

[2022-11-15 18:00:09.442 21348/F16986] B[0008]: /usr/bin/pihole-FTL(main+0xbd) [0x4b6a8e]

[2022-11-15 18:00:09.495 21348/F16986] B[0009]: /lib/arm-linux-gnueabihf/libc.so.6(__libc_start_main+0x10c) [0xb6d58718]

JeremyS132 commented 2 years ago

I am having the same issue.

-rw-r--r-- 1 pihole pihole 77K Nov 15 19:01 /var/log/pihole/FTL.log -----head of FTL.log------ [2022-11-15 00:00:37.832 8592/T8610] SQLite3 message: database is locked in "SELECT name FROM network_addresses WHERE name IS NOT NULL AND ip = ?;" (5) [2022-11-15 00:00:37.908 8592/T8610] getNameFromIP("10.0.1.33") - SQL error prepare: database is locked [2022-11-15 00:08:39.354 8592M] Resizing "FTL-dns-cache" from 40960 to (2816 16) == 45056 (/dev/shm: 9.2MB used, 2.0GB total, FTL uses 9.2MB) [2022-11-15 00:10:00.227 8592/T8608] Notice: Database size is 2860.47 MB, deleted 1587 rows [2022-11-15 00:20:00.244 8592/T8608] Notice: Database size is 2860.47 MB, deleted 1334 rows [2022-11-15 00:29:34.557 8592M] Resizing "FTL-dns-cache" from 45056 to (3072 16) == 49152 (/dev/shm: 9.2MB used, 2.0GB total, FTL uses 9.2MB) [2022-11-15 00:30:00.216 8592/T8608] Notice: Database size is 2860.47 MB, deleted 1184 rows [2022-11-15 00:40:02.564 8592/T8608] Notice: Database size is 2860.47 MB, deleted 424 rows [2022-11-15 00:50:06.266 8592/T8608] Notice: Database size is 2860.47 MB, deleted 425 rows [2022-11-15 00:52:05.099 3838/F8592] !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! [2022-11-15 00:52:05.099 3838/F8592] ----------------------------> FTL crashed! <---------------------------- [2022-11-15 00:52:05.099 3838/F8592] !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! [2022-11-15 00:52:05.099 3838/F8592] Please report a bug at https://github.com/pi-hole/FTL/issues [2022-11-15 00:52:05.099 3838/F8592] and include in your report already the following details: [2022-11-15 00:52:05.099 3838/F8592] FTL has been running for 9806 seconds [2022-11-15 00:52:05.100 3838/F8592] FTL branch: master [2022-11-15 00:52:05.100 3838/F8592] FTL version: v5.19.1 [2022-11-15 00:52:05.100 3838/F8592] FTL commit: b48b3e1f [2022-11-15 00:52:05.100 3838/F8592] FTL date: 2022-11-14 22:01:50 +0000 [2022-11-15 00:52:05.100 3838/F8592] FTL user: started as pihole, ended as pihole [2022-11-15 00:52:05.100 3838/F8592] Compiled for armv7hf (compiled on CI) using arm-linux-gnueabihf-gcc (Debian 8.3.0-2) 8.3.0 [2022-11-15 00:52:05.100 3838/F8592] Process details: MID: 8592 [2022-11-15 00:52:05.100 3838/F8592] PID: 3838 [2022-11-15 00:52:05.100 3838/F8592] TID: 3838 [2022-11-15 00:52:05.100 3838/F8592] Name: pihole-FTL [2022-11-15 00:52:05.101 3838/F8592] Received signal: Segmentation fault [2022-11-15 00:52:05.101 3838/F8592] at address: 0xb6efb008 [2022-11-15 00:52:05.101 3838/F8592] with code: SEGV_MAPERR (Address not mapped to object) [2022-11-15 00:52:05.101 3838/F8592] Backtrace: [2022-11-15 00:52:05.102 3838/F8592] B[0000]: /usr/bin/pihole-FTL(generate_backtrace+0x2d) [0x50b46e]

[2022-11-15 00:52:05.195 3838/F8592] B[0001]: /usr/bin/pihole-FTL(+0x3b7fe) [0x50b7fe]

[2022-11-15 00:52:05.249 3838/F8592] B[0002]: /lib/arm-linux-gnueabihf/libc.so.6(__default_rt_sa_restorer+0) [0xb6ca4db0] [2022-11-15 00:52:05.249 3838/F8592] B[0003]: /usr/bin/pihole-FTL(_lock_shm+0x55) [0x50a906]

chriseng commented 2 years ago

Same issue here. Today alone it's happened three times so far. I'm not sure if maybe it was always crashing and now it's just failing to restart properly? First time I've experienced a major-ish bug with Pi-Hole in many years of using it, which is pretty amazing. Let me know if I can provide any other info that will help debug. Sample log entry below:

[2022-11-15 01:20:00.245 9841/T9857] Notice: Database size is 2011.41 MB, deleted 473 rows [2022-11-15 01:22:01.811 9841M] Resizing "FTL-dns-cache" from 12288 to (1024 16) == 16384 (/dev/shm: 5.4MB used, 226.4MB total, FTL uses 5.4MB) [2022-11-15 01:27:30.581 10850/F9841] Resizing "FTL-dns-cache" from 16384 to (1280 16) == 20480 (/dev/shm: 5.4MB used, 226.4MB total, FTL uses 5.4MB) [2022-11-15 01:27:30.584 10849/F9841] Remapping "FTL-queries" from 4145152 to (94208 44) == 4145152 [2022-11-15 01:27:30.584 10849/F9841] Remapping "FTL-domains" from 102400 to (5120 20) == 102400 [2022-11-15 01:27:30.585 10849/F9841] Remapping "FTL-clients" from 331776 to (512 648) == 331776 [2022-11-15 01:27:30.585 10849/F9841] Remapping "FTL-upstreams" from 618496 to (1024 604) == 618496 [2022-11-15 01:27:30.586 10849/F9841] Remapping "FTL-dns-cache" from 16384 to (1280 16) == 20480 [2022-11-15 01:27:30.586 10849/F9841] Remapping "FTL-per-client-regex" from 4096 to (4096 1) == 4096 [2022-11-15 01:27:30.587 10849/F9841] Remapping "FTL-strings" from 163840 to (163840 1) == 163840 [2022-11-15 01:27:30.641 10851/F9841] Remapping "FTL-queries" from 4145152 to (94208 44) == 4145152 [2022-11-15 01:27:30.642 10851/F9841] Remapping "FTL-domains" from 102400 to (5120 20) == 102400 [2022-11-15 01:27:30.642 10851/F9841] Remapping "FTL-clients" from 331776 to (512 648) == 331776 [2022-11-15 01:27:30.643 10851/F9841] Remapping "FTL-upstreams" from 618496 to (1024 604) == 618496 [2022-11-15 01:27:30.643 10851/F9841] Remapping "FTL-dns-cache" from 16384 to (1280 16) == 20480 [2022-11-15 01:27:30.644 10851/F9841] Remapping "FTL-per-client-regex" from 4096 to (4096 1) == 4096 [2022-11-15 01:27:30.644 10851/F9841] Remapping "FTL-strings" from 163840 to (163840 1) == 163840 [2022-11-15 01:27:30.691 9841M] Remapping "FTL-queries" from 4145152 to (94208 44) == 4145152 [2022-11-15 01:27:30.692 9841M] Remapping "FTL-domains" from 102400 to (5120 20) == 102400 [2022-11-15 01:27:30.692 9841M] Remapping "FTL-clients" from 331776 to (512 648) == 331776 [2022-11-15 01:27:30.693 9841M] Remapping "FTL-upstreams" from 618496 to (1024 604) == 618496 [2022-11-15 01:27:30.693 9841M] Remapping "FTL-dns-cache" from 16384 to (1280 16) == 20480 [2022-11-15 01:27:30.694 9841M] Remapping "FTL-per-client-regex" from 4096 to (4096 1) == 4096 [2022-11-15 01:27:30.694 9841M] Remapping "FTL-strings" from 163840 to (163840 1) == 163840 [2022-11-15 01:30:00.351 9841/T9857] Notice: Database size is 2011.41 MB, deleted 537 rows [2022-11-15 01:30:45.741 9841M] Resizing "FTL-strings" from 163840 to (204800 1) == 204800 (/dev/shm: 5.4MB used, 226.4MB total, FTL uses 5.4MB) [2022-11-15 01:35:15.445 11189/F9841] Resizing "FTL-dns-cache" from 20480 to (1536 16) == 24576 (/dev/shm: 5.5MB used, 226.4MB total, FTL uses 5.4MB) [2022-11-15 01:35:15.446 11188/F9841] Remapping "FTL-queries" from 4145152 to (94208 44) == 4145152 [2022-11-15 01:35:15.447 11188/F9841] Remapping "FTL-domains" from 102400 to (5120 20) == 102400 [2022-11-15 01:35:15.447 11188/F9841] Remapping "FTL-clients" from 331776 to (512 648) == 331776 [2022-11-15 01:35:15.448 11188/F9841] Remapping "FTL-upstreams" from 618496 to (1024 604) == 618496 [2022-11-15 01:35:15.448 11188/F9841] Remapping "FTL-dns-cache" from 20480 to (1536 16) == 24576 [2022-11-15 01:35:15.449 11188/F9841] Remapping "FTL-per-client-regex" from 4096 to (4096 1) == 4096 [2022-11-15 01:35:15.449 11188/F9841] Remapping "FTL-strings" from 204800 to (204800 1) == 204800 [2022-11-15 01:35:15.450 11192/F9841] Remapping "FTL-queries" from 4145152 to (94208 44) == 4145152 [2022-11-15 01:35:15.451 11192/F9841] Remapping "FTL-domains" from 102400 to (5120 20) == 102400 [2022-11-15 01:35:15.452 11192/F9841] Remapping "FTL-clients" from 331776 to (512 648) == 331776 [2022-11-15 01:35:15.452 11192/F9841] Remapping "FTL-upstreams" from 618496 to (1024 604) == 618496 [2022-11-15 01:35:15.453 11192/F9841] Remapping "FTL-dns-cache" from 20480 to (1536 16) == 24576 [2022-11-15 01:35:15.453 11192/F9841] Remapping "FTL-per-client-regex" from 4096 to (4096 1) == 4096 [2022-11-15 01:35:15.454 11192/F9841] Remapping "FTL-strings" from 204800 to (204800 1) == 204800 [2022-11-15 01:35:15.455 11191/F9841] Remapping "FTL-queries" from 4145152 to (94208 44) == 4145152 [2022-11-15 01:35:15.455 11191/F9841] Remapping "FTL-domains" from 102400 to (5120 20) == 102400 [2022-11-15 01:35:15.456 11191/F9841] Remapping "FTL-clients" from 331776 to (512 648) == 331776 [2022-11-15 01:35:15.456 11191/F9841] Remapping "FTL-upstreams" from 618496 to (1024 604) == 618496 [2022-11-15 01:35:15.457 11191/F9841] Remapping "FTL-dns-cache" from 20480 to (1536 16) == 24576 [2022-11-15 01:35:15.457 11191/F9841] Remapping "FTL-per-client-regex" from 4096 to (4096 1) == 4096 [2022-11-15 01:35:15.458 11191/F9841] Remapping "FTL-strings" from 204800 to (204800 1) == 204800 [2022-11-15 01:35:15.476 11187/F9841] Remapping "FTL-queries" from 4145152 to (94208 44) == 4145152 [2022-11-15 01:35:15.477 11187/F9841] Remapping "FTL-domains" from 102400 to (5120 20) == 102400 [2022-11-15 01:35:15.477 11187/F9841] Remapping "FTL-clients" from 331776 to (512 648) == 331776 [2022-11-15 01:35:15.478 11187/F9841] Remapping "FTL-upstreams" from 618496 to (1024 604) == 618496 [2022-11-15 01:35:15.478 11187/F9841] Remapping "FTL-dns-cache" from 20480 to (1536 16) == 24576 [2022-11-15 01:35:15.479 11187/F9841] Remapping "FTL-per-client-regex" from 4096 to (4096 1) == 4096 [2022-11-15 01:35:15.479 11187/F9841] Remapping "FTL-strings" from 204800 to (204800 1) == 204800 [2022-11-15 01:35:15.528 11193/F9841] Remapping "FTL-queries" from 4145152 to (94208 44) == 4145152 [2022-11-15 01:35:15.529 11193/F9841] Remapping "FTL-domains" from 102400 to (5120 20) == 102400 [2022-11-15 01:35:15.530 11193/F9841] Remapping "FTL-clients" from 331776 to (512 648) == 331776 [2022-11-15 01:35:15.531 11193/F9841] Remapping "FTL-upstreams" from 618496 to (1024 604) == 618496 [2022-11-15 01:35:15.531 11193/F9841] Remapping "FTL-dns-cache" from 20480 to (1536 16) == 24576 [2022-11-15 01:35:15.532 11193/F9841] Remapping "FTL-per-client-regex" from 4096 to (4096 1) == 4096 [2022-11-15 01:35:15.532 11193/F9841] Remapping "FTL-strings" from 204800 to (204800 1) == 204800 [2022-11-15 01:35:16.334 11194/F9841] Remapping "FTL-queries" from 4145152 to (94208 44) == 4145152 [2022-11-15 01:35:16.335 11194/F9841] Remapping "FTL-domains" from 102400 to (5120 20) == 102400 [2022-11-15 01:35:16.335 11194/F9841] Remapping "FTL-clients" from 331776 to (512 648) == 331776 [2022-11-15 01:35:16.336 11194/F9841] Remapping "FTL-upstreams" from 618496 to (1024 604) == 618496 [2022-11-15 01:35:16.336 11194/F9841] Remapping "FTL-dns-cache" from 20480 to (1536 16) == 24576 [2022-11-15 01:35:16.337 11194/F9841] Remapping "FTL-per-client-regex" from 4096 to (4096 1) == 4096 [2022-11-15 01:35:16.337 11194/F9841] Remapping "FTL-strings" from 204800 to (204800 1) == 204800 [2022-11-15 01:35:16.393 11195/F9841] Remapping "FTL-queries" from 4145152 to (94208 44) == 4145152 [2022-11-15 01:35:16.393 11195/F9841] Remapping "FTL-domains" from 102400 to (5120 20) == 102400 [2022-11-15 01:35:16.394 11195/F9841] Remapping "FTL-clients" from 331776 to (512 648) == 331776 [2022-11-15 01:35:16.394 11195/F9841] Remapping "FTL-upstreams" from 618496 to (1024 604) == 618496 [2022-11-15 01:35:16.395 11195/F9841] Remapping "FTL-dns-cache" from 20480 to (1536 16) == 24576 [2022-11-15 01:35:16.395 11195/F9841] Remapping "FTL-per-client-regex" from 4096 to (4096 1) == 4096 [2022-11-15 01:35:16.396 11195/F9841] Remapping "FTL-strings" from 204800 to (204800 1) == 204800 [2022-11-15 01:35:16.448 11196/F9841] Remapping "FTL-queries" from 4145152 to (94208 44) == 4145152 [2022-11-15 01:35:16.449 11196/F9841] Remapping "FTL-domains" from 102400 to (5120 20) == 102400 [2022-11-15 01:35:16.450 11196/F9841] Remapping "FTL-clients" from 331776 to (512 648) == 331776 [2022-11-15 01:35:16.450 11196/F9841] Remapping "FTL-upstreams" from 618496 to (1024 604) == 618496 [2022-11-15 01:35:16.450 11196/F9841] Remapping "FTL-dns-cache" from 20480 to (1536 16) == 24576 [2022-11-15 01:35:16.451 11196/F9841] Remapping "FTL-per-client-regex" from 4096 to (4096 1) == 4096 [2022-11-15 01:35:16.451 11196/F9841] Remapping "FTL-strings" from 204800 to (204800 1) == 204800 [2022-11-15 01:35:19.918 9841M] Remapping "FTL-queries" from 4145152 to (94208 44) == 4145152 [2022-11-15 01:35:19.919 9841M] Remapping "FTL-domains" from 102400 to (5120 20) == 102400 [2022-11-15 01:35:19.919 9841M] Remapping "FTL-clients" from 331776 to (512 648) == 331776 [2022-11-15 01:35:19.920 9841M] Remapping "FTL-upstreams" from 618496 to (1024 604) == 618496 [2022-11-15 01:35:19.920 9841M] Remapping "FTL-dns-cache" from 20480 to (1536 16) == 24576 [2022-11-15 01:35:19.921 9841M] Remapping "FTL-per-client-regex" from 4096 to (4096 1) == 4096 [2022-11-15 01:35:19.922 9841M] Remapping "FTL-strings" from 204800 to (204800 1) == 204800 [2022-11-15 01:38:02.540 11614/F9841] Resizing "FTL-dns-cache" from 24576 to (1792 16) == 28672 (/dev/shm: 5.5MB used, 226.4MB total, FTL uses 5.4MB) [2022-11-15 01:38:02.562 11613/F9841] Remapping "FTL-queries" from 4145152 to (94208 44) == 4145152 [2022-11-15 01:38:02.562 11613/F9841] Remapping "FTL-domains" from 102400 to (5120 20) == 102400 [2022-11-15 01:38:02.563 11613/F9841] Remapping "FTL-clients" from 331776 to (512 648) == 331776 [2022-11-15 01:38:02.563 11613/F9841] Remapping "FTL-upstreams" from 618496 to (1024 604) == 618496 [2022-11-15 01:38:02.564 11613/F9841] Remapping "FTL-dns-cache" from 24576 to (1792 16) == 28672 [2022-11-15 01:38:02.564 11613/F9841] Remapping "FTL-per-client-regex" from 4096 to (4096 1) == 4096 [2022-11-15 01:38:02.565 11613/F9841] Remapping "FTL-strings" from 204800 to (204800 1) == 204800 [2022-11-15 01:38:02.583 11615/F9841] Remapping "FTL-queries" from 4145152 to (94208 44) == 4145152 [2022-11-15 01:38:02.584 11615/F9841] Remapping "FTL-domains" from 102400 to (5120 20) == 102400 [2022-11-15 01:38:02.584 11615/F9841] Remapping "FTL-clients" from 331776 to (512 648) == 331776 [2022-11-15 01:38:02.585 11615/F9841] Remapping "FTL-upstreams" from 618496 to (1024 604) == 618496 [2022-11-15 01:38:02.585 11615/F9841] Remapping "FTL-dns-cache" from 24576 to (1792 16) == 28672 [2022-11-15 01:38:02.586 11615/F9841] Remapping "FTL-per-client-regex" from 4096 to (4096 1) == 4096 [2022-11-15 01:38:02.586 11615/F9841] Remapping "FTL-strings" from 204800 to (204800 1) == 204800 [2022-11-15 01:38:03.970 9841M] Remapping "FTL-queries" from 4145152 to (94208 44) == 4145152 [2022-11-15 01:38:03.971 9841M] Remapping "FTL-domains" from 102400 to (5120 20) == 102400 [2022-11-15 01:38:03.971 9841M] Remapping "FTL-clients" from 331776 to (512 648) == 331776 [2022-11-15 01:38:03.972 9841M] Remapping "FTL-upstreams" from 618496 to (1024 604) == 618496 [2022-11-15 01:38:03.972 9841M] Remapping "FTL-dns-cache" from 24576 to (1792 16) == 28672 [2022-11-15 01:38:03.973 9841M] Remapping "FTL-per-client-regex" from 4096 to (4096 1) == 4096 [2022-11-15 01:38:03.973 9841M] Remapping "FTL-strings" from 204800 to (204800 1) == 204800 [2022-11-15 01:39:00.669 11715/F9841] !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! [2022-11-15 01:39:00.669 11715/F9841] ----------------------------> FTL crashed! <---------------------------- [2022-11-15 01:39:00.669 11715/F9841] !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! [2022-11-15 01:39:00.670 11715/F9841] Please report a bug at https://github.com/pi-hole/FTL/issues [2022-11-15 01:39:00.679 11715/F9841] and include in your report already the following details: [2022-11-15 01:39:00.680 11715/F9841] FTL has been running for 1572 seconds [2022-11-15 01:39:00.680 11715/F9841] FTL branch: master [2022-11-15 01:39:00.680 11715/F9841] FTL version: v5.19.1 [2022-11-15 01:39:00.680 11715/F9841] FTL commit: b48b3e1f [2022-11-15 01:39:00.681 11715/F9841] FTL date: 2022-11-14 22:01:50 +0000 [2022-11-15 01:39:00.682 11715/F9841] FTL user: started as pihole, ended as pihole [2022-11-15 01:39:00.682 11715/F9841] Compiled for armv6hf (compiled on CI) using arm-linux-gnueabihf-gcc (GCC) 10.3.0 [2022-11-15 01:39:00.682 11715/F9841] Process details: MID: 9841 [2022-11-15 01:39:00.683 11715/F9841] PID: 11715 [2022-11-15 01:39:00.683 11715/F9841] TID: 11715 [2022-11-15 01:39:00.683 11715/F9841] Name: pihole-FTL [2022-11-15 01:39:00.685 11715/F9841] Received signal: Segmentation fault [2022-11-15 01:39:00.685 11715/F9841] at address: 0xb6f5d008 [2022-11-15 01:39:00.685 11715/F9841] with code: SEGV_MAPERR (Address not mapped to object) [2022-11-15 01:39:00.687 11715/F9841] Backtrace: [2022-11-15 01:39:00.691 11715/F9841] B[0000]: /usr/bin/pihole-FTL(generate_backtrace+0x48) [0x4c71b0]

[2022-11-15 01:39:01.053 11715/F9841] B[0001]: /usr/bin/pihole-FTL(+0x4b744) [0x4c7744]

[2022-11-15 01:39:01.377 11715/F9841] B[0002]: /lib/arm-linux-gnueabihf/libc.so.6(__default_rt_sa_restorer+0) [0xb6d0f130] [2022-11-15 01:39:01.378 11715/F9841] B[0003]: /usr/bin/pihole-FTL(_lock_shm+0x8c) [0x4c6000]

[2022-11-15 01:39:01.564 11715/F9841] B[0004]: /usr/bin/pihole-FTL(_FTL_new_query+0x494) [0x4b65c8]

[2022-11-15 01:39:01.733 11715/F9841] B[0005]: /usr/bin/pihole-FTL(tcp_request+0x634) [0x4ff6ec]

[2022-11-15 01:39:02.004 11715/F9841] B[0006]: /usr/bin/pihole-FTL(+0x75238) [0x4f1238]

[2022-11-15 01:39:02.182 11715/F9841] B[0007]: /usr/bin/pihole-FTL(main_dnsmasq+0x1648) [0x4f3564]

[2022-11-15 01:39:02.360 11715/F9841] B[0008]: /usr/bin/pihole-FTL(main+0x114) [0x4aabf4]

[2022-11-15 01:39:02.508 11715/F9841] B[0009]: /lib/arm-linux-gnueabihf/libc.so.6(__libc_start_main+0x10c) [0xb6cf9718] [2022-11-15 01:39:02.509 11715/F9841] ------ Listing content of directory /dev/shm ------ [2022-11-15 01:39:02.509 11715/F9841] File Mode User:Group Size Filename [2022-11-15 01:39:02.510 11715/F9841] rwxrwxrwx root:root 260 . [2022-11-15 01:39:02.511 11715/F9841] rwxr-xr-x root:root 4K .. [2022-11-15 01:39:02.512 11715/F9841] rw------- pihole:pihole 4K FTL-per-client-regex [2022-11-15 01:39:02.513 11715/F9841] rw------- pihole:pihole 29K FTL-dns-cache [2022-11-15 01:39:02.514 11715/F9841] rw------- pihole:pihole 12K FTL-overTime [2022-11-15 01:39:02.516 11715/F9841] rw------- pihole:pihole 4M FTL-queries [2022-11-15 01:39:02.517 11715/F9841] rw------- pihole:pihole 618K FTL-upstreams [2022-11-15 01:39:02.518 11715/F9841] rw------- pihole:pihole 332K FTL-clients [2022-11-15 01:39:02.519 11715/F9841] rw------- pihole:pihole 102K FTL-domains [2022-11-15 01:39:02.520 11715/F9841] rw------- pihole:pihole 205K FTL-strings [2022-11-15 01:39:02.521 11715/F9841] rw------- pihole:pihole 16 FTL-settings [2022-11-15 01:39:02.522 11715/F9841] rw------- pihole:pihole 248 FTL-counters [2022-11-15 01:39:02.523 11715/F9841] rw------- pihole:pihole 56 FTL-lock [2022-11-15 01:39:02.523 11715/F9841] --------------------------------------------------- [2022-11-15 01:39:02.524 11715/F9841] Please also include some lines from above the !!!!!!!!! header. [2022-11-15 01:39:02.524 11715/F9841] Thank you for helping us to improve our FTL engine! [2022-11-15 01:39:02.524 11715/F9841] Asking parent pihole-FTL (PID 9841) to shut down [2022-11-15 01:39:02.526 9841M] Received: Real-time signal 2 (36 -> 2) [2022-11-15 01:39:02.528 9841M] Shutting down... [2022-11-15 01:39:02.530 11715/F9841] Failed to unlock inner SHM lock: Operation not permitted [2022-11-15 01:39:02.933 9841M] Finished final database update (stored 5 queries) [2022-11-15 01:39:02.934 9841M] Waiting for threads to join [2022-11-15 01:39:02.934 9841M] Thread database (0) is idle, terminating it. [2022-11-15 01:39:02.939 9841M] Thread housekeeper (1) is idle, terminating it. [2022-11-15 01:39:02.940 9841M] Thread DNS client (2) is idle, terminating it. [2022-11-15 01:39:02.942 9841M] All threads joined [2022-11-15 01:39:02.942 9841M] Joining API worker thread 0 [2022-11-15 01:39:02.943 9841M] Joining API worker thread 1 [2022-11-15 01:39:02.945 9841M] Joining API worker thread 2 [2022-11-15 01:39:02.946 9841M] Joining API worker thread 3 [2022-11-15 01:39:02.948 9841M] Joining API worker thread 4 [2022-11-15 01:39:02.970 9841M] ########## FTL terminated after 26m 14s (code 1)! ########## [2022-11-15 02:58:55.645 11987M] Using log file /var/log/pihole/FTL.log [2022-11-15 02:58:55.645 11987M] ########## FTL started on raspberrypi! ##########

DL6ER commented 2 years ago

Sorry for the delay but - to my excuse - it has been night in mainland Europe (and, in fact, still is at 3:30 AM) but this didn't let me sleep well enough so let's start debugging. This comment mainly documents my thoughts along my nightly thinking process and is not necessarily well structured.

Observations

What I got from your logs above is that there are no immediate relations between any debugging output from DEBUG_SHMEM and the crash itself. What I do see is that it happens in forked processes for all of you, i.e., dedicated sub-processes spawned by FTL to handle dedicated TCP connections. You can see this from the PID description like 11715/F9841 which reads like "this is process with PID 11715 which is a Fork of process 9841".

That this crash "only" happens in TCP forks explains why there are many but not hundreds (or thousands?) of affected users but, unfortunately, this also means the crash is happening in the one region of the code that is most difficult to debug. But we'll nevertheless try it, of course.

Where it is happening

It happens when the fork tries to access the global shared memory counter

https://github.com/pi-hole/FTL/blob/b48b3e1fb2bd08c68302ad116cd849a56afb869f/src/shmem.c#L418-L420

suggesting to me that the shmSettings object isn't pointing to valid memory in the forks (but everywhere else which is a bit weird). Related code was changed recently in https://github.com/pi-hole/FTL/pull/1464, so it's not a complete mystery why this is happening.

What I tried myself

I tried to reproduce this myself spamming my FTL with tens of thousands of TCP requests, each triggering a fork, but I never saw a crash within around an hour of extreme load on the DNS server.

DL6ER commented 2 years ago

What to do next

Please anyone who can reproduce this and has DEBUG_SHMEM=true enabled (see here: https://github.com/pi-hole/FTL/issues/1478#issuecomment-1315831406), run the command

pihole checkout ftl fix/forked_shmSettings

and furthermore add DEBUG_LOCKS=true in your /etc/pihole/pihole-FTL.conf (either do it before the checkout or restart your pihole-FTL afterwards).

While this will not fix the issue at hand right now, it'll hopefully give me enough context to fix it afterwards. Please be prepared that your log file will grow quickly so you should only be doing this if you have a few gigabytes of free space on your disk/SD card. I'll only need a few lines above the crash, no need to upload mega- or gigabytes of logfile in the end.

Alternative way to upload your logs

Something like

tail -n 200 /var/log/pihole/FTL.log | pihole tricorder

will securely upload the last 200 lines from your /var/log/pihole/FTL.log to a Pi-hole server where only the core developers can access it (but even this only after you providing the displayed token). Just a note, I'll be happy with any kind of shared log files that I can access.

JeremyS132 commented 2 years ago

What to do next

Please anyone who can reproduce this and has DEBUG_SHMEM=true enabled (see here: #1478 (comment)), run the command

pihole checkout ftl fix/forked_shmSettings

and furthermore add DEBUG_LOCKS=true in your /etc/pihole/pihole-FTL.conf (either do it before the checkout or restart your pihole-FTL afterwards).

While this will not fix the issue at hand right now, it'll hopefully give me enough context to fix it afterwards. Please be prepared that your log file will grow quickly so you should only be doing this if you have a few gigabytes of free space on your disk/SD card. I'll only need a few lines above the crash, no need to upload mega- or gigabytes of logfile in the end.

Alternative way to upload your logs

Something like

tail -n 200 /var/log/pihole/FTL.log | pihole tricorder

will securely upload the last 200 lines from your /var/log/pihole/FTL.log to a Pi-hole server where only the core developers can access it (but even this only after you providing the displayed token). Just a note, I'll be happy with any kind of shared log files that I can access.

I can't reproduce this as it appears to be random so far. I have enabled this and will report back when I have some logs to share and observe the crash.

chriseng commented 2 years ago

I don't have a specific way to trigger the crash either, but I've enabled both debug flags and have the fix/forked_shmSettings branch running, so will let it run and see if it happens again before the SD card fills up! Will report back.

dheering commented 1 year ago

Also had this problem late yesterday evening. Upgraded to Pi-hole [v5.14.1] FTL [v5.19.1] Web Interface [v5.17] earlier that evening.

Running on raspberry pi: Raspberry Pi 3 Model B Plus Rev 1.3 Distributor ID: Debian Description: Debian GNU/Linux 11 (bullseye) Release: 11 Codename: bullseye Linux pihole 5.15.74-v8+ #1595 SMP PREEMPT Wed Oct 26 11:07:24 BST 2022 aarch64 GNU/Linux

Log: [2022-11-15 22:57:08.474 152268M] Resizing "FTL-dns-cache" from 102400 to (6656 * 16) == 106496 (/dev/shm: 4.7MB used, 476.9MB total, FTL uses 4.7MB) [2022-11-15 23:12:04.676 154785/F152268] SQLite3 message: database is locked in "SELECT hwaddr FROM network WHERE id = (SELECT network_id FROM network_addresses WHERE ip = ? GROUP BY ip HAVING max(lastSeen));" (5) [2022-11-15 23:12:04.677 154785/F152268] getMACfromIP("192.168.1.113") - SQL error prepare: database is locked [2022-11-15 23:12:05.678 154785/F152268] SQLite3 message: database is locked in "SELECT name FROM network_addresses WHERE name IS NOT NULL AND ip = ?;" (5) [2022-11-15 23:12:05.679 154785/F152268] getNameFromIP("192.168.1.113") - SQL error prepare: database is locked [2022-11-15 23:12:06.679 154785/F152268] SQLite3 message: database is locked in "SELECT interface FROM network JOIN network_addresses ON network_addresses.network_id = network.id W HERE network_addresses.ip = ? AND interface != 'N/A' AND interface IS NOT NULL;" (5) [2022-11-15 23:12:06.680 154785/F152268] getIfaceFromIP("192.168.1.113") - SQL error prepare: database is locked [2022-11-15 23:12:06.770 154792/F152268] !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! [2022-11-15 23:12:06.770 154792/F152268] ----------------------------> FTL crashed! <---------------------------- [2022-11-15 23:12:06.771 154792/F152268] !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! [2022-11-15 23:12:06.771 154792/F152268] Please report a bug at https://github.com/pi-hole/FTL/issues [2022-11-15 23:12:06.771 154792/F152268] and include in your report already the following details: [2022-11-15 23:12:06.771 154792/F152268] FTL has been running for 13713 seconds [2022-11-15 23:12:06.771 154792/F152268] FTL branch: master [2022-11-15 23:12:06.771 154792/F152268] FTL version: v5.19.1 [2022-11-15 23:12:06.771 154792/F152268] FTL commit: b48b3e1f [2022-11-15 23:12:06.771 154792/F152268] FTL date: 2022-11-14 22:01:50 +0000 [2022-11-15 23:12:06.772 154792/F152268] FTL user: started as pihole, ended as pihole [2022-11-15 23:12:06.772 154792/F152268] Compiled for aarch64 (compiled on CI) using aarch64-linux-gnu-gcc (Debian 8.3.0-2) 8.3.0 [2022-11-15 23:12:06.772 154792/F152268] Process details: MID: 152268 [2022-11-15 23:12:06.772 154792/F152268] PID: 154792 [2022-11-15 23:12:06.773 154792/F152268] TID: 154792 [2022-11-15 23:12:06.773 154792/F152268] Name: pihole-FTL [2022-11-15 23:12:06.773 154792/F152268] Received signal: Segmentation fault [2022-11-15 23:12:06.773 154792/F152268] at address: 0x7f95dc7008 [2022-11-15 23:12:06.773 154792/F152268] with code: SEGV_MAPERR (Address not mapped to object) [2022-11-15 23:12:06.775 154792/F152268] Backtrace: [2022-11-15 23:12:06.777 154792/F152268] B[0000]: /usr/bin/pihole-FTL(generate_backtrace+0x38) [0x557623b300]

[2022-11-15 23:12:06.890 154792/F152268] B[0001]: /usr/bin/pihole-FTL(+0x5b7d4) [0x557623b7d4]

[2022-11-15 23:12:06.946 154792/F152268] B[0002]: linux-vdso.so.1(__kernel_rt_sigreturn+0) [0x7f95dcd788] [2022-11-15 23:12:06.946 154792/F152268] B[0003]: /usr/bin/pihole-FTL(_lock_shm+0x88) [0x557623a338]

[2022-11-15 23:12:07.001 154792/F152268] B[0004]: /usr/bin/pihole-FTL(_FTL_new_query+0x2b8) [0x557622cff8]

[2022-11-15 23:12:07.060 154792/F152268] B[0005]: /usr/bin/pihole-FTL(tcp_request+0x6a4) [0x557626ca4c]

[2022-11-15 23:12:07.129 154792/F152268] B[0006]: /usr/bin/pihole-FTL(+0x7fa7c) [0x557625fa7c]

[2022-11-15 23:12:07.196 154792/F152268] B[0007]: /usr/bin/pihole-FTL(main_dnsmasq+0xfc0) [0x5576261598]

[2022-11-15 23:12:07.263 154792/F152268] B[0008]: /usr/bin/pihole-FTL(main+0x100) [0x55762214b0]

[2022-11-15 23:12:07.314 154792/F152268] B[0009]: /lib/aarch64-linux-gnu/libc.so.6(__libc_start_main+0xe8) [0x7f95b54e18] [2022-11-15 23:12:07.315 154792/F152268] B[0010]: /usr/bin/pihole-FTL(+0x41978) [0x5576221978]

[2022-11-15 23:12:07.419 154792/F152268] ------ Listing content of directory /dev/shm ------ [2022-11-15 23:12:07.419 154792/F152268] File Mode User:Group Size Filename [2022-11-15 23:12:07.419 154792/F152268] rwxrwxrwx root:root 260 . [2022-11-15 23:12:07.419 154792/F152268] rwxr-xr-x root:root 4K .. [2022-11-15 23:12:07.420 154792/F152268] rw------- pihole:pihole 4K FTL-per-client-regex [2022-11-15 23:12:07.420 154792/F152268] rw------- pihole:pihole 106K FTL-dns-cache [2022-11-15 23:12:07.420 154792/F152268] rw------- pihole:pihole 8K FTL-overTime [2022-11-15 23:12:07.420 154792/F152268] rw------- pihole:pihole 4M FTL-queries [2022-11-15 23:12:07.420 154792/F152268] rw------- pihole:pihole 315K FTL-upstreams [2022-11-15 23:12:07.421 154792/F152268] rw------- pihole:pihole 86K FTL-clients [2022-11-15 23:12:07.421 154792/F152268] rw------- pihole:pihole 111K FTL-domains [2022-11-15 23:12:07.421 154792/F152268] rw------- pihole:pihole 164K FTL-strings [2022-11-15 23:12:07.421 154792/F152268] rw------- pihole:pihole 16 FTL-settings [2022-11-15 23:12:07.422 154792/F152268] rw------- pihole:pihole 248 FTL-counters [2022-11-15 23:12:07.422 154792/F152268] rw------- pihole:pihole 104 FTL-lock [2022-11-15 23:12:07.422 154792/F152268] --------------------------------------------------- [2022-11-15 23:12:07.422 154792/F152268] Please also include some lines from above the !!!!!!!!! header. [2022-11-15 23:12:07.422 154792/F152268] Thank you for helping us to improve our FTL engine! [2022-11-15 23:12:07.422 154792/F152268] Asking parent pihole-FTL (PID 152268) to shut down [2022-11-15 23:12:07.422 154792/F152268] FTL fork terminated! [2022-11-15 23:12:07.422 152268M] Received: Real-time signal 2 (36 -> 2) [2022-11-15 23:12:07.423 154791/F152268] Error when obtaining outer SHM lock: Owner died [2022-11-15 23:12:07.438 152268M] Shutting down... [2022-11-15 23:12:08.690 152268M] SQLite3 message: database is locked in "INSERT INTO query_storage (timestamp,type,status,domain,client,forward,additional_info,reply_type,reply_ti me,dnssec) VALUES (?1,?2,?3,(SELECT id FROM domain_by_id WHERE domain = ?4),(SEL (5) [2022-11-15 23:12:08.690 152268M] WARNING: Storing queries in long-term database failed: database is locked

[2022-11-15 23:12:08.690 152268M] Keeping queries in memory for later new attempt [2022-11-15 23:12:08.691 152268M] Waiting for threads to join [2022-11-15 23:12:09.204 152268/T152281] Terminating GC thread [2022-11-15 23:12:09.372 152268/T152282] Terminating resolver thread [2022-11-15 23:12:10.443 152268/T152280] Terminating database thread [2022-11-15 23:12:10.444 152268M] All threads joined [2022-11-15 23:12:10.444 152268M] Joining API worker thread 0 [2022-11-15 23:12:10.445 152268M] Joining API worker thread 1 [2022-11-15 23:12:10.445 152268M] Joining API worker thread 2 [2022-11-15 23:12:10.445 152268M] Joining API worker thread 3 [2022-11-15 23:12:10.445 152268M] Joining API worker thread 4 [2022-11-15 23:12:10.457 152268M] ########## FTL terminated after 3h 48m 37s (code 1)! ##########

cathalferris commented 1 year ago

I'm another user with newly occurring crashes on RaspberryPi after the most recent update to FTL v5.19.1 - approx three crashes since 24hr ago. I cannot reliably trigger the segfault on demand, but it's been occurring randomly between two and six hours after a "pihole dnsrestart". I was going to crontab this restart, but instead I have decided to enable the FTL fork and both of the conf file debug.

My home network setup is a bit involved with a pfSense firewall providing DHCP and working IPV4 and IPV6, a handful of VLANs with the PiHole on a trunked switchport to see all the vlan DNS requests. I've got the firewall NATting all DNS requests. About 200,000 queries per 24hr and 47 clients. Yes, it's only a home network..

An FTL log file from before these debug changes is at https://tricorder.pi-hole.net/0Y6AIayJ/ for your perusal.

If/when it crashes again, I'll upload the new file.

Thanks and good luck.

somera commented 1 year ago

What to do next

Please anyone who can reproduce this and has DEBUG_SHMEM=true enabled (see here: #1478 (comment)), run the command

pihole checkout ftl fix/forked_shmSettings

and furthermore add DEBUG_LOCKS=true in your /etc/pihole/pihole-FTL.conf (either do it before the checkout or restart your pihole-FTL afterwards).

While this will not fix the issue at hand right now, it'll hopefully give me enough context to fix it afterwards. Please be prepared that your log file will grow quickly so you should only be doing this if you have a few gigabytes of free space on your disk/SD card. I'll only need a few lines above the crash, no need to upload mega- or gigabytes of logfile in the end.

Alternative way to upload your logs

Something like

tail -n 200 /var/log/pihole/FTL.log | pihole tricorder

will securely upload the last 200 lines from your /var/log/pihole/FTL.log to a Pi-hole server where only the core developers can access it (but even this only after you providing the displayed token). Just a note, I'll be happy with any kind of shared log files that I can access.

I try to reproduce it.

somera commented 1 year ago

my RATE_LIMIT ist set to

RATE_LIMIT=0/0
MAXDBDAYS=365
chrispontiga commented 1 year ago

This happened to me on two separate piholes at the same time at 7 am this morning. They are operating in an HA configuration, and BOTH went down. I recently updated both two nights ago to Pi-hole [v5.14.1], FTL [v5.19.1], Web Interface [v5.17]

ChrSch84 commented 1 year ago

After 5 Minutes Uptime... I Disable Pihole for now :/

https://tricorder.pi-hole.net/XAgCGIYq/

DL6ER commented 1 year ago

I am really sorry for this and I'm trying really hard myself to try to reproduce this (also taking the RATE_LIMIT=0/0 of @somera, MAXDBDAYS=365 is default, into account) but to no avail so far. It seems nobody who has switched on the needed debug logging has seen a crash, so far, but I need something to work on...

somera commented 1 year ago

I am really sorry for this and I'm trying really hard myself to try to reproduce this (also taking the RATE_LIMIT=0/0 of @somera, MAXDBDAYS=365 is default, into account) but to no avail so far. It seems nobody who has switched on the needed debug logging has seen a crash, so far, but I need something to work on...

I'm running fix/forked_shmSettings branch now for ~5h. But I can't reproduce the crash.

RATE_LIMIT=0/0
MAXDBDAYS=365
DEBUG_ALL=false
DEBUG_DATABASE=false
DEBUG_LOCKS=false
DEBUG_QUERIES=false
DEBUG_GC=false
PRIVACYLEVEL=0
DEBUG_SHMEM=true
DEBUG_LOCKS=true

I'm still on it.

decommentator commented 1 year ago

after upgrade yesterday my pihole crashed last night with same error cannot reproduce it

cathalferris commented 1 year ago

I did not have an actual crash yet since enabling the debug items.

I did however have a failure of the PiHole to respond to queries, and issuing a "pihole restartdns" fully failed. I did not see a failure in the log file, but I did make a copy of the logfile at that point. 104Mb in size - let me know if this is wanted to be sent to the tricorder or not.

Last entries in FTL.log after attempting the restart of FTL, before the reboot listed at the end here. At this point I rebooted. Currently things are stable with uptime of 3h20m. Any further updates and I'll ping here.


[2022-11-16 14:35:01.918 31626M] Adding "e9.whatsapp.net" (len 16) to buffer. next_str_pos is 37887
[2022-11-16 14:35:01.927 31626M] Adding "e1.whatsapp.net" (len 16) to buffer. next_str_pos is 37903
[2022-11-16 14:35:03.111 31626M] Adding "title.auth.xboxlive.com" (len 24) to buffer. next_str_pos is 37919
[2022-11-16 14:35:03.115 31626M] Adding "user.auth.xboxlive.com" (len 23) to buffer. next_str_pos is 37943
[2022-11-16 14:35:03.130 31626M] Adding "img-prod-cms-rt-microsoft-com.akamaized.net" (len 44) to buffer. next_str_pos is 37966
[2022-11-16 14:35:03.134 31626M] Adding "store-images.s-microsoft.com" (len 29) to buffer. next_str_pos is 38010
[2022-11-16 14:35:03.202 31626M] Adding "eplists.xboxlive.com" (len 21) to buffer. next_str_pos is 38039
[2022-11-16 14:35:03.495 31626M] Resizing "FTL-queries" from 8470528 to (196608 * 44) == 8650752 (/dev/shm: 19.9MB used, 483.5MB total, FTL uses 9.6MB)
[2022-11-16 14:35:03.498 31626M] SHMEM pointer updated: 0x763e5000 -> 0x75ba5000 (8470528 8650752)
[2022-11-16 14:35:06.361 31626M] Resizing "FTL-queries" from 8650752 to (200704 * 44) == 8830976 (/dev/shm: 20.1MB used, 483.5MB total, FTL uses 9.7MB)
[2022-11-16 14:35:06.364 31626M] SHMEM pointer not updated: 0x75ba5000 (8650752 8830976)
[2022-11-16 14:35:08.014 31626M] Removed lock in DB_read_queries() (/__w/FTL/FTL/src/database/query-table.c:1006)
[2022-11-16 14:35:08.014 31626M] Imported 198669 queries from the long-term database
[2022-11-16 14:35:08.016 31626M]  -> Total DNS queries: 198669
[2022-11-16 14:35:08.016 31626M]  -> Cached DNS queries: 62521
[2022-11-16 14:35:08.017 31626M]  -> Forwarded DNS queries: 53124
[2022-11-16 14:35:08.017 31626M]  -> Blocked DNS queries: 80105
[2022-11-16 14:35:08.017 31626M]  -> Unknown DNS queries: 3
[2022-11-16 14:35:08.017 31626M]  -> Unique domains: 1480
[2022-11-16 14:35:08.017 31626M]  -> Unique clients: 47
[2022-11-16 14:35:08.018 31626M]  -> Known forward destinations: 5
[2022-11-16 14:35:08.018 31626M] Successfully accessed setupVars.conf
[2022-11-16 14:35:08.018 31626M] DEBUG: argv[0] = ""
[2022-11-16 14:35:08.018 31626M] DEBUG: argv[1] = "-k"
[2022-11-16 14:35:08.018 31626M] DEBUG: argv[2] = "--log-debug"
[2022-11-16 14:35:08.022 31626M] FATAL ERROR in dnsmasq core: failed to create listening socket for port 53: Address already in use
[2022-11-16 14:35:08.067 31626M] ########## FTL terminated after 2m 35s  (code 1)! ##########

Looking through the logfile, there was nothing useful written to the log when it appears that FTL hung.


[2022-11-16 13:29:00.348 21589/T21593] Waiting for SHM lock in process_request() (/__w/FTL/FTL/src/api/request.c:39)
[2022-11-16 13:29:00.377 21589/T21606] Waiting for SHM lock in GC_thread() (/__w/FTL/FTL/src/gc.c:162)
[2022-11-16 13:29:01.013 21589/T21590] Waiting for SHM lock in process_request() (/__w/FTL/FTL/src/api/request.c:39)
[2022-11-16 13:34:00.262 25776/F21589] TCP worker terminating (timeout)
[2022-11-16 13:34:00.300 25781/F21589] TCP worker terminating (timeout)
[2022-11-16 13:34:00.300 25781/F21589] Waiting for SHM lock in FTL_TCP_worker_terminating() (/__w/FTL/FTL/src/dnsmasq_interface.c:3070)
[2022-11-16 13:34:00.301 25792/F21589] TCP worker terminating (timeout)
[2022-11-16 13:34:00.301 25792/F21589] Waiting for SHM lock in FTL_TCP_worker_terminating() (/__w/FTL/FTL/src/dnsmasq_interface.c:3070)
[2022-11-16 14:00:00.362 21589/T21607] Waiting for SHM lock in resolveClients() (/__w/FTL/FTL/src/resolve.c:394)
[2022-11-16 14:32:32.931 31626M] Using log file /var/log/pihole/FTL.log
[2022-11-16 14:32:32.931 31626M] ########## FTL started on pihole! ##########
[2022-11-16 14:32:32.931 31626M] FTL branch: fix/forked_shmSettings
[2022-11-16 14:32:32.932 31626M] FTL version: vDev-c689cf2
[2022-11-16 14:32:32.932 31626M] FTL commit: c689cf23
[2022-11-16 14:32:32.932 31626M] FTL date: 2022-11-16 04:26:40 +0100
[2022-11-16 14:32:32.932 31626M] FTL user: pihole
[2022-11-16 14:32:32.932 31626M] Compiled for armv7hf (compiled on CI) using arm-linux-gnueabihf-gcc (Debian 8.3.0-2) 8.3.0
[2022-11-16 14:32:32.932 31626M] Starting config file parsing (/etc/pihole/pihole-FTL.conf)
[2022-11-16 14:32:32.933 31626M]    SOCKET_LISTENING: only local
[2022-11-16 14:32:32.933 31626M]    AAAA_QUERY_ANALYSIS: Show AAAA queries
[2022-11-16 14:32:32.933 31626M]    MAXDBDAYS: max age for stored queries is 365 days
[2022-11-16 14:32:32.933 31626M]    RESOLVE_IPV6: Resolve IPv6 addresses
[2022-11-16 14:32:32.933 31626M]    RESOLVE_IPV4: Resolve IPv4 addresses
[2022-11-16 14:32:32.934 31626M]    DBINTERVAL: saving to DB file every minute
[2022-11-16 14:32:32.934 31626M]    DBFILE: Using /etc/pihole/pihole-FTL.db
[2022-11-16 14:32:32.934 31626M]    MAXLOGAGE: Importing up to 24.0 hours of log data
somera commented 1 year ago

@cathalferris with DEBUG_LOCKS=true I see a lot

[2022-11-16 18:10:40.638 281089/F160513] verify_shmem_pid(): updating pointer of shmSettings: 0x7f84010d6000 -> 0x7f84010dd000
[2022-11-16 18:10:40.638 281089/F160513] verify_shmem_pid(): updating pointer of shmSettings: 0x7f84010dd000 -> 0x7f84010d6000
[2022-11-16 18:10:40.644 281091/F160513] TCP worker terminating (client disconnected)
[2022-11-16 18:10:40.644 281089/F160513] TCP worker terminating (client disconnected)
[2022-11-16 18:10:40.644 281091/F160513] verify_shmem_pid(): updating pointer of shmSettings: 0x7f84010d6000 -> 0x7f84010dd000
[2022-11-16 18:10:40.645 281089/F160513] verify_shmem_pid(): updating pointer of shmSettings: 0x7f84010d6000 -> 0x7f84010dd000
[2022-11-16 18:10:40.891 160513/T160515] verify_shmem_pid(): updating pointer of shmSettings: 0x7f84010dd000 -> 0x7f84010d6000
[2022-11-16 18:10:40.894 160513/T160517] verify_shmem_pid(): updating pointer of shmSettings: 0x7f84010d6000 -> 0x7f84010dd000
[2022-11-16 18:10:40.895 160513/T160516] verify_shmem_pid(): updating pointer of shmSettings: 0x7f84010dd000 -> 0x7f84010d6000
[2022-11-16 18:10:40.897 160513/T160518] verify_shmem_pid(): updating pointer of shmSettings: 0x7f84010d6000 -> 0x7f84010dd000
[2022-11-16 18:10:40.900 160513/T160514] verify_shmem_pid(): updating pointer of shmSettings: 0x7f84010dd000 -> 0x7f84010d6000
[2022-11-16 18:10:40.902 160513/T160515] verify_shmem_pid(): updating pointer of shmSettings: 0x7f84010d6000 -> 0x7f84010dd000
[2022-11-16 18:10:40.904 160513/T160517] verify_shmem_pid(): updating pointer of shmSettings: 0x7f84010dd000 -> 0x7f84010d6000
[2022-11-16 18:10:41.544 160513/T160530] verify_shmem_pid(): updating pointer of shmSettings: 0x7f84010d6000 -> 0x7f84010dd000

lines.

DL6ER commented 1 year ago

@cathalferris

failed to create listening socket for port 53: Address already in use

This is somewhat out of our control, there was an initial check whether another FTL instance is running and since that did not show any error (and failed immediately thereafter), it is rather likely another process had port 53 at that time.

Concerning the hanging, it would be useful to see more lines above what you showed.


@cathalferris Yeah, that's all expected. I'm interested in the last values it reported before a crash happens.

somera commented 1 year ago

@DL6ER after the crashes yesterday I had the same

failed to create listening socket for port 53: Address already in use

error after pihole restartdns. The problem was, that after the crash the pihole-FTP process war running. After I kill the process pihole restartdns worked.

JeremyS132 commented 1 year ago

@DL6ER I didn't think I had logs but I do.

https://tricorder.pi-hole.net/YMA0D063/

This occurred at [2022-11-16 10:07:05.487 5576/T5580]

This was prior to doing the checkout. I have since done that since the crash.

cathalferris commented 1 year ago

Okay - one large (My apologies, but it may be useful ) FTL logfile here:

https://tricorder.pi-hole.net/7Dqy5hjO/

-Cathal

PromoFaux commented 1 year ago

Thank you all for your logs - hopefully we can use them to better understand what is causing this issue.

If anyone wishes to revert to the previous version of FTL, you can run the command pihole checkout ftl v5.18.2 (once a fix is released, you can easily switch back by running pihole checkout ftl master)

decommentator commented 1 year ago

[2022-11-16 18:59:08.657 56992/F2166] !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! [2022-11-16 18:59:08.657 56992/F2166] ----------------------------> FTL crashed! <---------------------------- [2022-11-16 18:59:08.657 56992/F2166] !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! [2022-11-16 18:59:08.657 56992/F2166] Please report a bug at https://github.com/pi-hole/FTL/issues [2022-11-16 18:59:08.657 56992/F2166] and include in your report already the following details: [2022-11-16 18:59:08.657 56992/F2166] FTL has been running for 42269 seconds [2022-11-16 18:59:08.657 56992/F2166] FTL branch: master [2022-11-16 18:59:08.657 56992/F2166] FTL version: v5.19.1 [2022-11-16 18:59:08.657 56992/F2166] FTL commit: b48b3e1f [2022-11-16 18:59:08.657 56992/F2166] FTL date: 2022-11-14 22:01:50 +0000 [2022-11-16 18:59:08.657 56992/F2166] FTL user: started as pihole, ended as pihole [2022-11-16 18:59:08.657 56992/F2166] Compiled for x86_64 (compiled on CI) using gcc (Debian 8.3.0-6) 8.3.0 [2022-11-16 18:59:08.657 56992/F2166] Process details: MID: 2166 [2022-11-16 18:59:08.657 56992/F2166] PID: 56992 [2022-11-16 18:59:08.658 56992/F2166] TID: 56992 [2022-11-16 18:59:08.658 56992/F2166] Name: pihole-FTL [2022-11-16 18:59:08.658 56992/F2166] Received signal: Segmentation fault [2022-11-16 18:59:08.658 56992/F2166] at address: 0x7f3500f94008 [2022-11-16 18:59:08.658 56992/F2166] with code: SEGV_MAPERR (Address not mapped to object)

somera commented 1 year ago

What to do next

Please anyone who can reproduce this and has DEBUG_SHMEM=true enabled (see here: #1478 (comment)), run the command

pihole checkout ftl fix/forked_shmSettings

and furthermore add DEBUG_LOCKS=true in your /etc/pihole/pihole-FTL.conf (either do it before the checkout or restart your pihole-FTL afterwards).

While this will not fix the issue at hand right now, it'll hopefully give me enough context to fix it afterwards. Please be prepared that your log file will grow quickly so you should only be doing this if you have a few gigabytes of free space on your disk/SD card. I'll only need a few lines above the crash, no need to upload mega- or gigabytes of logfile in the end.

Alternative way to upload your logs

Something like

tail -n 200 /var/log/pihole/FTL.log | pihole tricorder

will securely upload the last 200 lines from your /var/log/pihole/FTL.log to a Pi-hole server where only the core developers can access it (but even this only after you providing the displayed token). Just a note, I'll be happy with any kind of shared log files that I can access.

Did you change something else in the fix/forked_shmSettings branch? I run it for ~6h. Without any crash. Than I switched to master and it crashed after ~30 minutes.

DL6ER commented 1 year ago

Absolutely no other changes, see here for yourself:

Screenshot_2022-11-16-20-18-36-49_3aea4af51f236e4932235fdada7d1643~2

https://github.com/pi-hole/FTL/compare/master..fix/forked_shmSettings

(note: this is only accurate as of now, when we add more changes to this branch in the future, this will become different)

somera commented 1 year ago

@DL6ER could the log process slowdown the flow and in this case the error will not occurs?

I will switch back to the fix/forked_shmSettings again.

DL6ER commented 1 year ago

Yes, that's the current working theory. Might be worth trying the new branch without the DEBUG options - and if only to rule out that the master binary has a defect somewhere and the re-compilation in the new branch was the actual solution. (it's unlikely, the build machinery is protecting against this using secure hash algorithms)

I'm currently working on an alternative solution, should be ready in a few.

DL6ER commented 1 year ago

I just pushed another change to the branch you are using here. *Please test it on your Pi-hole without any `DEBUG_` settings and see if this resolved the crash.**

Please run

pihole checkout ftl fix/forked_shmSettings

to update (you should also be able use pihole -up if you are already in this branch).


Technical details for those who are interested: My most recent commit brings two changes: (1) Don't check SHM ownership on every SHM interaction but only in the danger zone (when reopening during resizing) and (2) Don't remap the global setting object but only read the few bytes we need to determine whether the PID has changed (or not).

If you don't see any crashes within a few hours, this will go out as an FTL hotfix release. This change could make it even worse if I screwed something up but I don't think I did.

cathalferris commented 1 year ago

I have a crash under debug conditions. Logs are uploading at the moment. It's 374Mb.

This crash does not include the fix of 10 min ago.

somera commented 1 year ago

I just pushed another change to the branch you are using here. *Please test it on your Pi-hole without any `DEBUG_` settings and see if this resolved the crash.**

Please run

pihole checkout ftl fix/forked_shmSettings

to update (you should also be able use pihole -up if you are already in this branch).

Technical details for those who are interested: My most recent commit brings two changes: (1) Don't check SHM ownership on every SHM interaction but only in the danger zone (when reopening during resizing) and (2) Don't remap the global setting object but only read the few bytes we need to determine whether the PID has changed (or not).

If you don't see any crashes within a few hours, this will go out as an FTL hotfix release. This change could make it even worse if I screwed something up but I don't think I did.

Now running

FTL vDev (fix/forked_shmSettings, vDev-3feb6b8)

with

RATE_LIMIT=0/0
MAXDBDAYS=365
DEBUG_ALL=false
DEBUG_DATABASE=false
DEBUG_LOCKS=false
DEBUG_QUERIES=false
DEBUG_GC=false
PRIVACYLEVEL=0
DEBUG_SHMEM=false
DEBUG_LOCKS=false
somera commented 1 year ago

I just pushed another change to the branch you are using here. *Please test it on your Pi-hole without any `DEBUG_` settings and see if this resolved the crash.** Please run

pihole checkout ftl fix/forked_shmSettings

to update (you should also be able use pihole -up if you are already in this branch). Technical details for those who are interested: My most recent commit brings two changes: (1) Don't check SHM ownership on every SHM interaction but only in the danger zone (when reopening during resizing) and (2) Don't remap the global setting object but only read the few bytes we need to determine whether the PID has changed (or not). If you don't see any crashes within a few hours, this will go out as an FTL hotfix release. This change could make it even worse if I screwed something up but I don't think I did.

Now running

FTL vDev (fix/forked_shmSettings, vDev-3feb6b8)

with

RATE_LIMIT=0/0
MAXDBDAYS=365
DEBUG_ALL=false
DEBUG_DATABASE=false
DEBUG_LOCKS=false
DEBUG_QUERIES=false
DEBUG_GC=false
PRIVACYLEVEL=0
DEBUG_SHMEM=false
DEBUG_LOCKS=false

Something goes wrong. Now I have the last commit. ;)

FTL vDev (fix/forked_shmSettings, vDev-78597dd)

DL6ER commented 1 year ago

We have had two basically identical commits, the difference between 3feb6b8 and 78597dd is merely a minor change in the logged string not affecting anything really.

rdwebdesign commented 1 year ago

@cathalferris

Were you able to upload the log file?

cathalferris commented 1 year ago

I've been trying, I've not had the upload fail or fimish yet - so I am still waiting for the upload to finish. As soon as I have info I'll post here.

dschaper commented 1 year ago

Large files can get munged when you attempt to upload them, that's why DL suggested

tail -n 200 /var/log/pihole/FTL.log | pihole tricorder

That only sends the data that we are interested in.

I do have ~100MB of a log that contains lilypad as a string, is that your log?

rdwebdesign commented 1 year ago

@cathalferris

I'm not sure how many lines your log has, but it must have too many lines in so many megabytes.

We only need lines around the crash. It's impossible to know how many lines will be needed to show the error, but even 2000 lines will be under 1MB.

This would send us a lot of lines and will be uploaded faster than the whole file.

tail -n 2000 /var/log/pihole/FTL.log | pihole tricorder
cathalferris commented 1 year ago

I understand that, and I do apologise..

It's done anyway: https://tricorder.pi-hole.net/yrcbSKVb/

cathalferris commented 1 year ago

@dschaper Yes, you are correct, that is my log.

I now have the current fix-branch in operation, and no debug present in the conf file. I've to head to bed now, and I'll pick up looking at this from the user POV tomorrow.

pralor-bot commented 1 year ago

This issue has been mentioned on Pi-hole Userspace. There might be relevant details there:

https://discourse.pi-hole.net/t/pi-hole-intermittently-diactivates-itself-after-upgrading-to-v5-14-1/59256/8

markdall commented 1 year ago

This may not be valuable at all, but: I'm running 2022.11 on a Pi3 and then in Docker on a Pi4. On the Pi3 I'm encountering this crash. On the Pi4 in a docker container, it's running fine - no crashes.

chrispontiga commented 1 year ago

Both my primary and secondary piholes crashed again at the exact same time, at 12-midnight on the dot. They crashed earlier this morning, both crashed at 7 am. Very odd that it's occurring right on the hours, and at the same moment for both devices.

clwgh commented 1 year ago

Mentioning this in case it's has any relevance; it may not. There was a dhcpcd bug a couple of years ago in which dhcpcd would segfault because some routers would send malformed ARP packets. This was discussed on the Flightaware forums, with a summary here and some tech details a couple posts after that. The whole thread going into the technical weeds. https://discussions.flightaware.com/t/piaware-wont-reconnect-to-wifi-network-if-it-drops-off/59789/60

Given that only some people are seeing this happen, and for some it's at set times, might something similar be happening here? In otherwords, possibly nothing to do with dhcpcd here, but the same kind of behaviour; a user's home network is triggering this FTL segfault bug. Some routers did it, others did not. I'm not seeing this Pi-hole bug yet and I'm using a wired connection.

May be not relevant so mentioning as a data point in case anything useful can be extracted.