Open mnlhfr opened 1 year ago
Thanks for your detailed analysis, you included all relevant aspects. However, I'm not sure if this is really a bug or not. Pivot is the
StartLimitBurst
in/etc/systemd/system/pihole-FTL.service
This was added to prevent endless-start-stop cycles in case of an error. Adding a custom DNS record needs a full FTL
restart to take effect. However, if you know you're going to add a lot in a row, $reload
can help to postpone the restart until everything is added. This is what we do when we import a (teleporter) backup (see https://github.com/pi-hole/AdminLTE/pull/2519). I'm not familiar with the tools you use to add the records, but as the access the API endpoints directly they always trigger the restart immediately.
I have no good idea how to solve (as in balance) between preventing to-many-restarts and being able to (manually) adding a lot of custom DNS records (other than increasing the limit). I think no one of us thought about adding >5 records manually within 60 seconds.
@DL6ER
I guess the same issue will happen in v6
?
thanks for your quick reply!
I initially thought that a pihole restartdns reload-lists
would be enough, but I seem to have missed that during my testing earlier.
However, as far as I can tell it is enough to issue a systemctl reload
, which does not cause any issues with the service, even if triggered many times in quick succession.
# nslookup testing-01.local localhost
Server: localhost
Address: 127.0.0.1#53
** server can't find testing-01.local: NXDOMAIN
2. add the record using cmd utility with reload=false
[✓] Adding custom DNS entry...
3. record still not in effect
Server: localhost Address: 127.0.0.1#53
** server can't find testing-01.local: NXDOMAIN
4. i initially thought that restartdns reload-lists would do the job, but:
[✓] Reloading DNS lists
Server: localhost Address: 127.0.0.1#53
** server can't find testing-01.local: NXDOMAIN
5. However, if we reload the service using systemctl:
6. the record is active now
Server: localhost Address: 127.0.0.1#53
Name: testing-01.local Address: 127.0.0.1
So, just reloading the service with systemctl seems to be enough and doesn't seem to trigger any issues with the service:
1. StartLimitBurst is set to the default value of 5
StartLimitBurst=5
2. reload the service 100 times
4. service still in active state
● pihole-FTL.service - Pi-hole FTL Loaded: loaded (/etc/systemd/system/pihole-FTL.service; enabled; vendor preset: enabled) Active: active (running) since Fri 2023-06-09 21:19:01 UTC; 1min 43s ago Process: 7317 ExecStartPre=/opt/pihole/pihole-FTL-prestart.sh (code=exited, status=0/SUCCESS) Process: 7450 ExecReload=/bin/kill -HUP $MAINPID (code=exited, status=0/SUCCESS) Main PID: 7330 (pihole-FTL) Tasks: 19 (limit: 9401) Memory: 10.4M CPU: 279ms CGroup: /system.slice/pihole-FTL.service └─7330 /usr/bin/pihole-FTL -f
Jun 09 21:20:39 pihole01 systemd[1]: Reloading Pi-hole FTL... Jun 09 21:20:39 pihole01 systemd[1]: Reloaded Pi-hole FTL. Jun 09 21:20:39 pihole01 systemd[1]: Reloading Pi-hole FTL... Jun 09 21:20:39 pihole01 systemd[1]: Reloaded Pi-hole FTL. Jun 09 21:20:39 pihole01 systemd[1]: Reloading Pi-hole FTL... Jun 09 21:20:39 pihole01 systemd[1]: Reloaded Pi-hole FTL. Jun 09 21:20:39 pihole01 systemd[1]: Reloading Pi-hole FTL... Jun 09 21:20:39 pihole01 systemd[1]: Reloaded Pi-hole FTL. Jun 09 21:20:39 pihole01 systemd[1]: Reloading Pi-hole FTL... Jun 09 21:20:39 pihole01 systemd[1]: Reloaded Pi-hole FTL.
Your analysis in not correct here:
https://github.com/pi-hole/AdminLTE/blob/3a11976ee8ecc50e2dd9efd76caad1ad41894dd5/scripts/pi-hole/php/func.php#L280 effectively results in pihole -a addcustomdns 127.0.0.1 test1.local (notice neither true nor false are part of the argument for restart)
When needed, $reload
will be set to false
on the lines just above the code you posted:
Resulting in: pihole -a addcustomdns 127.0.0.1 test1.local false
Your analysis in not correct here:
effectively results in pihole -a addcustomdns 127.0.0.1 test1.local (notice neither true nor false are part of the argument for restart)
When needed,
$reload
will be set tofalse
on the lines just above the code you posted:Resulting in:
pihole -a addcustomdns 127.0.0.1 test1.local false
you are correct, thanks for pointing that out! and sorry for the confusing!
However, as far as I can tell it is enough to issue a systemctl reload, which does not cause any issues with the service, even if triggered many times in quick succession.
I think you are correct. The reload
triggers the read of custom.list
by dnsmasq
Jun 9 23:35:39 dnsmasq[3578109]: read /etc/hosts - 7 names
Jun 9 23:35:39 dnsmasq[3578109]: read /etc/pihole/custom.list - 22 names
Jun 9 23:35:39 dnsmasq[3578109]: read /etc/pihole/local.list - 0 names
Note: this will only work for the custom DNS records, not custom CNAME records (as files in /etc/dnsmasq.d/
are not read by reload)
Please try if
pihole checkout core no_reload
fixes the issue for you.
Please try if
pihole checkout core no_reload
fixes the issue for you.
not quite, but when I additionally add the reload to the func.php
as well, it seems to be working:
https://github.com/pi-hole/AdminLTE/blob/3a11976ee8ecc50e2dd9efd76caad1ad41894dd5/scripts/pi-hole/php/func.php#L284
thanks!
Good catch.
We did not plan to release any new v5 version and focus on v6 - however the changes necessary here are trivial after the bug was dissected. We'll discuss internally how to proceed.
I guess the same issue will happen in v6?
No. v6 tries to be "more clever" than many of the elements we have in v5. This involves trying to minimize restarting of FTL to as seldom as possible. Custom DNS records are a prime example. When I coded this part of the v6 interface, I very much disliked that the DNS cache is completely flushed by a restart
/reload
so I changed two things: Firstly, this file is now in a watched directors (hostsdir
) so FTL finds itself out when the file is changed (no need to send a signal at all). Secondly, I submitted a patch upstream into dnsmasq
(it is already included in the current release of dnsmasq
) that ensures the DNS cache is not completely flushed but only those elements from the updated list are removed (and then repopulated).
After internal discussion we decided to not release a new v5 version. Reasons are
I have posted a workaround for v5 in this thread https://discourse.pi-hole.net/t/ftl-crashes-repeatedly-when-updating-dns-records-through-external-dns/66867
Versions
Platform
Expected behavior
Adding or Deleting custom DNS records through webadmin GUI or respectively directly through POST requests to the API endpoint at
/admin/scripts/pi-hole/php/customdns.php
causes pihole-FTL to just bereloaded
.Actual behavior / bug
repeatedly adding or deleting DNS records through the web interface (
/admin/scripts/pi-hole/php/customdns.php
) causes pihole-FTL to be restarted (not reloaded) with each and every DNS record added. This leads to systemd hitting theStartLimitBurst=5
configured in/etc/systemd/system/pihole-FTL.service
leading to consecutive restarts of the service to fail.systemctl status pihole-FTL:
Steps to reproduce
Steps to reproduce the behavior:
Shell
killed
and fail to start withFailed with result 'start-limit-hit'
Webadmin
killed
and fail to start withFailed with result 'start-limit-hit'
Debug Token
I don't think this should be necessary here.
Additional context
I am using the pihole provider from kubernetes-sigs/external-dns in combination with borchero/switchboard to automatically add DNS records for services in my cluster when they get exposed. This worked fine while setting everything up.
However, after exposing more services for external-dns to manage DNS for, pihole-FTL started to behave oddly/refused to start. The way kubernetes-sigs/external-dns adds the dns records, is by just simply sending POST requests to the same endpoint the webadmin GUI uses (
/admin/scripts/pi-hole/php/customdns.php
). Due to this API endpoint not offering anyupdate
functionality and also the fact that noTXT
records are possible, this results in a little bit of a "spammy" behaviour from external-dns. As far as i understand the implementation of the pihole provider on external-dns, there are a couple workarounds in place, due to the API limitations of pihole-FTL. There might also be another bug in the pihole provider for external-dns, but I have not yet spent the time to dig into that side any deeper and I believe that even if this was the case, it would not change validity of this bug report.For a quick and dirty workaround I adjusted
StartLimitBurst
in/etc/systemd/system/pihole-FTL.service
.When digging through the source code of pihole I noticed a couple things and I am not exactly sure where this should be fixed.
https://github.com/pi-hole/AdminLTE/blob/3a11976ee8ecc50e2dd9efd76caad1ad41894dd5/scripts/pi-hole/php/func.php#L223 empty string as default value for the
reload
argument.https://github.com/pi-hole/AdminLTE/blob/3a11976ee8ecc50e2dd9efd76caad1ad41894dd5/scripts/pi-hole/php/func.php#L280 effectively results in
pihole -a addcustomdns 127.0.0.1 test1.local
(notice neithertrue
norfalse
are part of the argument for restart)https://github.com/pi-hole/pi-hole/blob/6a45c6a8e027e1ac30d4556a88f31684bc80ccf1/pihole#L579
pihole
shell script defers toAddCustomDNSAddress
inwebpage.sh
https://github.com/pi-hole/pi-hole/blob/6a45c6a8e027e1ac30d4556a88f31684bc80ccf1/advanced/Scripts/webpage.sh#L719-L743 RestartDNS command will be issued due to missing empty reload argument
https://github.com/pi-hole/AdminLTE/blob/3a11976ee8ecc50e2dd9efd76caad1ad41894dd5/scripts/pi-hole/php/func.php#L284 RestartDNS triggered again. Should this be
restartdns reload-lists
instead of justrestartdns
here?Conclusion
The issue here seems to affect both the AdminLTE as well as the pihole repository, so I am not entirely sure how the best or preferred way of fixing this would look like.