Open BhSimon opened 2 years ago
I am seeing this too but only on one system.
I have 3 pihole, 1 master, 1 slave on the same LAN as the master running on identical hardware and a third built on google cloud (firewalled and VPN'ed not open).
Of those only the one on google cloud shows this behaviour. They are running identical versions of pihole / ftl etc. so I cant tell whats going on.
This isnt tied to a specific table. It errors on the first table that had synced changes (which the script drops from SQL and recreates).
I'll keep digging.
So I thought I'd found the cause of this - I thought it only occured when the pihole OS was using pihole/loopback/127.0.0.1 for its DNS but apparently that was a red-herring so am no nearer to a cause.
That was based on the main difference in the logs between my working and non-working system is that the non-working one has this.
[✗] DNS service is not running [✓] Restarting DNS server [✓] DNS resolution is now available
I do however have a fix. As you mentioned running pihole -g works which is what the script does after syncing the tables. So the 'solution' is to run pihole -g twice.
This is obviously not ideal as it downloads the adlists twice and I have no idea what the underlying problem is. The foreign key error basically means that the contents of one table is indexed to a field or fields in another table and it doesn't like the content/format/quantity of fields etc. it finds. I cannot get my working slave to fail and I cannot get the failing slave to work except by running -g again.
So simple if not ideal fix is to edit the script and duplicate the pihole -g line in the --pull section. Not sure if this is needed in the pull_initialise too as I havent tested.
So I've seen this now when pihole does it's weekly update check. I'm not sure what the problem is but it's definitely related to how cloudsync works as it drops and recreates tables within gravity.db that future pihole -g processes don't like.
The only fix I have found so far is to run pihole -g again so I wrote a small script that can be run regularly via cron to catch and fix the problem. It uses inbuilt pihole functions so shouldn't add too much load.
I cant post the code here as the display messes the syntax so have to add as a file. Its a very short simple script so you can see what it does. Remove the .txt to use
[pihole-zero-blocking-fix.sh.txt] (https://github.com/stevejenkins/pihole-cloudsync/files/8512075/pihole-zero-blocking-fix.sh.txt)
I just came across this issue and am trying out a different solution.
I am using cron to keep things automated and simply added the command && sudo pihole -g
at the end of the cron jobs posted in the instructions. Not 100% sure this is going to work, but giving it a try.
Here's the amended cron jobs that I am using now:
On the Primary:
00 01,07,13,19 * * * sudo /usr/local/bin/pihole-cloudsync/pihole-cloudsync --push && sudu pihole -g > /dev/null 2>&1 #Push Master Pi-hole Lists to remote Git repo
On the Secondary piholes:
05 01,07,13,19 * * * sudo /usr/local/bin/pihole-cloudsync/pihole-cloudsync --pull && sudo pihole -g > /dev/null 2>&1 #Pull Master Pi-hole Lists from remote Git repo
Looking for any feedback or advice that others may have!
If the reliable solution is to run pihole -g twice (which it appears to be) then nothing wrong with this but it wont be 100% and I personally don't prefer it.
The only reason(s) I say that is with the above pihole -g is only run when a --pull happens and not when pihole itself does something such as an adlist refresh. I have definitely seen my adlist = 0 not related to a --pull as it happened at 4am and my --pull checks happen at 1am so I'm assuming that was a refresh by pihole itself.
The other thing I dont like is that pihole -g is always run twice on a --pull even if the first run worked ok. So you will always be retreiving all your adlists twice. Thats extra load on the adlist serving sites which seems a shame.
I run my --pull checks daily but dont change the master lists much so the end result is infrequent calls to pihole -g.
The thing I dont like about my solution is that I am regularly running the check via cron but there is always a potential window of the adlists = 0 before I catch it and fix. The only way to improve that is to run my check more often but you can only go so far with that as the check / pihole -g must have time to complete before the next check otherwise it will just loop.
Various solutions possible, none of them great.
The best solution by far would be to understand what is going wrong with the initial pihole -g / sql and avoid the problem but I think thats buried somewhere in the sql and there may not be a solution as the tables are inter-linked.
edit: I just re-read your cron details and realised (if I'm reading it right) that you are running push four times a day at zero minutes past the hour and pull four times a day at five minutes past the hour - but - you are running pihole -g every time including the push'es. This should definitely not be required as the underlying issue appears to be the sync script playing with the sql tables but it only does that on the 'slaves'. The master pihole should never have this issue.
Also confirming this issue... I added a second line in the script under pull()
:
$SUDO pihole -g
I haven't had this error yet, but wanted to post an alternative suggestion to pihole -g inside cron
you can use a CURL command to trigger pihole -g like this:
curl http://pihole/admin/scripts/pi-hole/php/gravity.sh.php -X POST -F "pw=YourLoginPassword"
There might be cases where this is useful, such as when PI runs inside a docker image.
cheers,
In my testing I believe I have found the problem. When the command to DROP TABLE adlist;
is executed it also removes all foreign key constraints, references to other tables that were set on this table when created by pihole, and disrupts the execution of triggers that had been set. When the command to import is run, a new table is created based on the .csv file. At this point sqlite3 has no information about the foreign key constraints, previous references that were present in the previous table, and does not run triggers that were attached to the previous table because they were not re-created by pihole-cloudsync.
When pihole runs the update gravity
command, it re-generates the gravity.db tables based on the schema defined in gravity.db.sql. Specifically, the adlist_by_group
table has a REFERENCES
keyword that points to the adlist
and group
tables. When pihole attempts to copy in the process of swapping gravity databases, it encounters an error here trying to delete based on references that pihole-cloudsync removed with the DROP
. My best guess is that this is because references and/or triggers got broken so mapping between the ids is also broken.
I have been able to fix this problem by changing the DROP
command to a DELETE
command. You can find these changes in this commit. (You should probably also do this for domainlist
.)
I believe the reason this fix works is because instead of destroying and re-creating the table, thereby breaking all the references and triggers set up by pihole it simply removes the rows. So, when pihole-cloudsync imports data from the .csv files the trigger to update the adlist_by_group
still runs.
I have confirmed this on two systems. After the cron job runs, the Dashboard in PiHole says there are 0 domains on adlists. I ran the cron command manually and noticed this:
More info: if I run
pihole -g
from the command line, logged in as root, it works fine.