ragibkl / adblock-dns-server

Adblock DNS Server powered by Bancuh DNS and dnsdist-acme
https://bancuh.com/
MIT License
63 stars 14 forks source link

instability issues #172

Closed ragibkl closed 1 year ago

ragibkl commented 1 year ago

Seems like i would get disconnected once a day, for 2 minutes. Don't know the cause yet, but should look into it.

ragibkl commented 1 year ago

I did the following:

I thought that maybe the script had issue at some fetches, which panics and causes the container to error out. Not sure if this was the case but patched that anyway. Let's monitor for few days.

Then, I saw some logs:

dns_1           | Fetch ok: https://raw.githubusercontent.com/ragibkl/adblock-dns-server/master/data/overrides.d/ignore-whitelist.zone, attempt: 1
dns_1           | compiling adblock list... done!
dns_1           | writing output file:
dns_1           |     output file: /etc/bind/blacklist.zone
dns_1           |     output format: zone
dns_1           | writing output file: done!
dns_1           | updating blacklist complete
dns_1           | server reload successful
dnsdist_1       | Marking downstream resolver1 (127.0.0.1:1153) as 'down'
dnsdist_1       | Marking downstream resolver1 (127.0.0.1:1153) as 'up'

What happens is, the blacklist update happens in the background. But during the server reload, the server could go down for a few seconds. Not sure if there's a way to do this with zero downtime.

ragibkl commented 1 year ago

Hmm, maybe I'm wrong. Looks like this can fail randomly at times:

dnsdist_1       | [logs] emptying log file
dnsdist_1       | [logs] emptying log file complete
dnsdist_1       | Marking downstream resolver1 (127.0.0.1:1153) as 'down'
dnsdist_1       | Marking downstream resolver1 (127.0.0.1:1153) as 'up'
dnsdist_1       | Marking downstream resolver1 (127.0.0.1:1153) as 'down'
dnsdist_1       | Marking downstream resolver1 (127.0.0.1:1153) as 'up'
dnsdist_1       | [logs] emptying log file
dnsdist_1       | [logs] emptying log file complete
dnsdist_1       | [logs] emptying log file
dnsdist_1       | [logs] emptying log file complete
dnsdist_1       | Marking downstream resolver1 (127.0.0.1:1153) as 'down'
dnsdist_1       | Marking downstream resolver1 (127.0.0.1:1153) as 'up'
dnsdist_1       | [logs] emptying log file
dnsdist_1       | [logs] emptying log file complete
dnsdist_1       | Marking downstream resolver1 (127.0.0.1:1153) as 'down'
dnsdist_1       | Marking downstream resolver1 (127.0.0.1:1153) as 'up'
dnsdist_1       | Marking downstream resolver1 (127.0.0.1:1153) as 'down'
dnsdist_1       | Marking downstream resolver1 (127.0.0.1:1153) as 'up'
dnsdist_1       | [logs] emptying log file
dnsdist_1       | [logs] emptying log file complete
dnsdist_1       | [logs] emptying log file
dnsdist_1       | [logs] emptying log file complete

I wonder if the dns container is having trouble to do full recursive domain resolution. I don't really like to fallback to forwarders mode, as that will mean introducing dns leaks again.

ragibkl commented 1 year ago

At the moment, I'm convinced that the ablc fetch retry will fix this. The current theory is as follows;

Relevant lines: https://github.com/ragibkl/adblock-list-compiler/blob/82768f220c7ce143ca5a75b45fc2be113b869f71/src/cli_run/compile.rs#L35-L38

We'll have to test for few more days to see.

ragibkl commented 1 year ago

I made a couple of fixes.

I'll keep monitoring for a few more days, but I do feel the stats are much better now.

ragibkl commented 1 year ago

This looks more stable now. I'm closing this.