Closed Tomatoide closed 11 months ago
Has been a hit or miss especially fr servers, also logs especially from the url rarely works
Some tests:
$ curl -H 'accept: application/dns-message' 'https://sg-dns1.bancuh.com/dns-query?dns=rmUBAAABAAAAAAAAB2NhcmVlcnMHb3BlbmRucwNjb20AAAEAAQ' | hexdump -C
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 53 100 53 0 0 306 0 --:--:-- --:--:-- --:--:-- 304
00000000 ae 65 81 80 00 01 00 01 00 00 00 00 07 63 61 72 |.e...........car|
00000010 65 65 72 73 07 6f 70 65 6e 64 6e 73 03 63 6f 6d |eers.opendns.com|
00000020 00 00 01 00 01 c0 0c 00 01 00 01 00 00 02 58 00 |..............X.|
00000030 04 d0 43 da 02 |..C..|
00000035
$ curl -H 'accept: application/dns-message' 'https://fr-dns1.bancuh.com/dns-query?dns=rmUBAAABAAAAAAAAB2NhcmVlcnMHb3BlbmRucwNjb20AAAEAAQ' | hexdump -C
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 21 100 21 0 0 42 0 --:--:-- --:--:-- --:--:-- 42
00000000 64 6e 73 20 71 75 65 72 79 20 6e 6f 74 20 61 6c |dns query not al|
00000010 6c 6f 77 65 64 |lowed|
00000015
$ curl -H 'accept: application/dns-message' 'https://fr-dns2.bancuh.com/dns-query?dns=rmUBAAABAAAAAAAAB2NhcmVlcnMHb3BlbmRucwNjb20AAAEAAQ' | hexdump -C
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 53 100 53 0 0 94 0 --:--:-- --:--:-- --:--:-- 94
00000000 ae 65 81 80 00 01 00 01 00 00 00 00 07 63 61 72 |.e...........car|
00000010 65 65 72 73 07 6f 70 65 6e 64 6e 73 03 63 6f 6d |eers.opendns.com|
00000020 00 00 01 00 01 c0 0c 00 01 00 01 00 00 02 58 00 |..............X.|
00000030 04 d0 43 da 02 |..C..|
00000035
It's quite inconsistent for fr1 and fr2. sg1 is fine.
I'll have to investigate why we get dns query not allowed
for some reason.
A simple explanation could be that we have lots of users on fr region, so some network packets are dropped?
I'll do more investigation.
I've added DOH monitors to https://status.bancuh.com/
.
I'll get some clue I hope when it happens next.
Ok hope you figure it out, btw, whats the difference between ipv4 and safesearch on the status page?
Also, still neither fr1 or fr2 doh is working for me
Update: seems to be working again, but logs url 'http://fr-dns1(2).bancuh.com:8080' does not open (connection has timed out error)
I think doh will be intermittent. The logs timeout is very curious.
Now I'm wondering if we just have more users than before! If true, I might have to try upgrade to a different server instead.
Well lets monitor it for a couple of days and see if the problem occurs again, not sure about the logs timeout
If I use the ip instead of the url, it works, don't know why the url doesn't
It's using a lot of memory.
Did you include any large lists in the configuration lately?
Ok hope you figure it out, btw, whats the difference between ipv4 and safesearch on the status page?
@Tomatoide ipv4 is just dns lookup for zedo.com. If the response is ok, it means that the dns server is up. Safesearch is to look for google.com, and it should get back the safe search domain overrides. If we get the expected force safe search override domain, it means that the configuration list were loaded correctly.
Looks like we have memory issues in the sg servers as well. It's just that the failure mode is different.
in fr, the dns process seems to be stuck in infinite loop, so it can't process any queries.
In sg, the ablc (adblock list compiler) gets killed. The dns list never gets updated, but it can still handle requests with outdated lists.
Can duplicate lists be a cause for the high memory usage? maybe I could to try to remove some lists that are already included in bigger lists and see if this could help
@Tomatoide , I think duplicate lists will cause the ablc
compiler to use slightly more memory. But it is not as serious.
The more serious I think is when the final deduplicated list is still very big. That will cause the dns process to use more memory.
Both can cause memory issues.
To be frank though, the reason we have this issue, is because I only created servers with 4GB
of RAM.
If I upgraded all the servers to 8GB
, the problem will go away, but it will cost me double the money every month!
maybe I could to try to remove some lists that are already included in bigger lists and see if this could help
@Tomatoide , for the moment, I think don't change anything yet.
For the moment, I've enabled swapfile on all servers. Let's monitor for a few days to see if it's stable. If that doesn't work, we can consider other options.
All green for now. Also it looks like swap is relieving some of the memory issues!
fr1:
fr2:
sg1:
sg2:
jp1:
It seems to be a bigger issue in sg though, as a lot more is going to swap!
Nice job! logs are also working again 👍
Just to be sure, wildcard asterisk domains format is supported by bancuh right? I changed some lists to this format in latest commits as it should be smaller and more efficient
Yes it should work. If I remember correctly, the dns list update runs evry 24 hours. So exactly 1 day after I added swap, it will run again.
Hi @ragibkl it looks like Dns over https is not working at all can you check this out thank you.