Closed ragibkl closed 1 year ago
I did the following:
I thought that maybe the script had issue at some fetches, which panics and causes the container to error out. Not sure if this was the case but patched that anyway. Let's monitor for few days.
Then, I saw some logs:
dns_1 | Fetch ok: https://raw.githubusercontent.com/ragibkl/adblock-dns-server/master/data/overrides.d/ignore-whitelist.zone, attempt: 1
dns_1 | compiling adblock list... done!
dns_1 | writing output file:
dns_1 | output file: /etc/bind/blacklist.zone
dns_1 | output format: zone
dns_1 | writing output file: done!
dns_1 | updating blacklist complete
dns_1 | server reload successful
dnsdist_1 | Marking downstream resolver1 (127.0.0.1:1153) as 'down'
dnsdist_1 | Marking downstream resolver1 (127.0.0.1:1153) as 'up'
What happens is, the blacklist update happens in the background. But during the server reload, the server could go down for a few seconds. Not sure if there's a way to do this with zero downtime.
Hmm, maybe I'm wrong. Looks like this can fail randomly at times:
dnsdist_1 | [logs] emptying log file
dnsdist_1 | [logs] emptying log file complete
dnsdist_1 | Marking downstream resolver1 (127.0.0.1:1153) as 'down'
dnsdist_1 | Marking downstream resolver1 (127.0.0.1:1153) as 'up'
dnsdist_1 | Marking downstream resolver1 (127.0.0.1:1153) as 'down'
dnsdist_1 | Marking downstream resolver1 (127.0.0.1:1153) as 'up'
dnsdist_1 | [logs] emptying log file
dnsdist_1 | [logs] emptying log file complete
dnsdist_1 | [logs] emptying log file
dnsdist_1 | [logs] emptying log file complete
dnsdist_1 | Marking downstream resolver1 (127.0.0.1:1153) as 'down'
dnsdist_1 | Marking downstream resolver1 (127.0.0.1:1153) as 'up'
dnsdist_1 | [logs] emptying log file
dnsdist_1 | [logs] emptying log file complete
dnsdist_1 | Marking downstream resolver1 (127.0.0.1:1153) as 'down'
dnsdist_1 | Marking downstream resolver1 (127.0.0.1:1153) as 'up'
dnsdist_1 | Marking downstream resolver1 (127.0.0.1:1153) as 'down'
dnsdist_1 | Marking downstream resolver1 (127.0.0.1:1153) as 'up'
dnsdist_1 | [logs] emptying log file
dnsdist_1 | [logs] emptying log file complete
dnsdist_1 | [logs] emptying log file
dnsdist_1 | [logs] emptying log file complete
I wonder if the dns
container is having trouble to do full recursive domain resolution. I don't really like to fallback to forwarders mode, as that will mean introducing dns leaks again.
At the moment, I'm convinced that the ablc fetch retry will fix this. The current theory is as follows;
Relevant lines: https://github.com/ragibkl/adblock-list-compiler/blob/82768f220c7ce143ca5a75b45fc2be113b869f71/src/cli_run/compile.rs#L35-L38
We'll have to test for few more days to see.
I made a couple of fixes.
host
network that causes this healthcheck to fail sometimes. I've disable healthcheck, and that seems to help.I'll keep monitoring for a few more days, but I do feel the stats are much better now.
This looks more stable now. I'm closing this.
Seems like i would get disconnected once a day, for 2 minutes. Don't know the cause yet, but should look into it.