ragibkl / adblock-dns-server

Adblock DNS Server powered by Bancuh DNS and dnsdist-acme
https://bancuh.com/
MIT License
65 stars 14 forks source link

Unstable servers behaviour #154

Closed ragibkl closed 2 years ago

ragibkl commented 2 years ago

Ever since I deployed the self-update feature, some of the servers are failing multiple times. It could be that my self-update script is taking too much memory, which prompts OS to kill the dns process. Of course, it could be unrelated, but it starts happening around the same time.

jp1 especially crashes a lot. Thanks @CasanierXI for reporting. I also observed crashes on fr1 and fr2.

I'm creating this ticket to track any issue related to performance and crashes.

ragibkl commented 2 years ago

What I got from server logs, was that the dns->bind process got killed. I suspect that the server ran out of memory, so it killed the dns bind process. However, it seems that the dns->entrypoint script does not properly shutdown, hence the dns container keeps running without the bind process. This also means that we don't have an actual dns server running.

The out of memory happens after few hours of running.

For now, I've bump jp1 to 8 GiB ram. I want to see how much RAM it uses up in total. :money_with_wings:

TODO: [ ] Observe jp1 max memory usage over 1-2 days [ ] debug the entrypoint script, fix it, so that the container properly shuts down during a crash

ragibkl commented 2 years ago

I can stop it, if the problem is only with this server. But I do want to investigate why it's using so much memory.

Ideally I want to rebuild it at same location with different os, maybe ubuntu or debian. Or, we can always try a different location.

@CasanierXI , do you have a preference where to place a server? Where are you located again? I'm familiar with digitalocean, scaleway, linode, and vultr.

ragibkl commented 2 years ago

Some memory usage

# free -m
              total        used        free      shared  buff/cache   available
Mem:           7820        2745        4330           8         745        4825
Swap:           511           0         511

[root@li2110-118 ~]# top
top - 06:58:07 up  5:14,  1 user,  load average: 0.20, 0.37, 0.25
Tasks: 128 total,   2 running, 126 sleeping,   0 stopped,   0 zombie
%Cpu(s):  1.7 us,  1.7 sy,  0.0 ni, 95.8 id,  0.0 wa,  0.0 hi,  0.3 si,  0.3 st
KiB Mem :  8008596 total,  4432984 free,  2812004 used,   763608 buff/cache
KiB Swap:   524284 total,   524284 free,        0 used.  4940000 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                                                                                                                                                      
 8617 100       20   0 2332376   2.2g   4620 S  11.1 28.9  24:49.12 named                                                                                                                                                                                                        
 1290 root      20   0 1407260  81104  30032 S   0.0  1.0   1:47.95 dockerd                                                                                                                                                                                                      
 8349 root      20   0  325600  65364  16904 S   0.0  0.8   0:27.12 node                                                                                                                                                                                                         
  827 root      20   0 1124740  55576  20332 S   0.0  0.7   0:12.55 containerd                                                                                                                                                                                                   
 8647 100       20   0   47408  36532   7172 S   1.4  0.5   2:25.92 dnsdist                                                                                                                                                                                                      
 8425 root      20   0  981612  29344   1508 S   1.4  0.4   2:19.64 docker-proxy

It's still not over limit of 4GiB yet, so let' see.

ragibkl commented 2 years ago

Today:

[root@li2110-118 ~]# free -m
              total        used        free      shared  buff/cache   available
Mem:           7820        3790        3267           8         762        3779
Swap:           511           0         511

[root@li2110-118 ~]# top

top - 00:30:39 up 22:46,  1 user,  load average: 0.00, 0.07, 0.16
Tasks: 126 total,   1 running, 125 sleeping,   0 stopped,   0 zombie
%Cpu(s):  1.4 us,  1.2 sy,  0.0 ni, 96.9 id,  0.0 wa,  0.0 hi,  0.1 si,  0.4 st
KiB Mem :  8008596 total,  3344468 free,  3883452 used,   780676 buff/cache
KiB Swap:   524284 total,   524284 free,        0 used.  3868132 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                                                                                                                                                      
 8617 100       20   0 3409248   3.2g   4628 S  10.3 42.3 124:27.15 named                                                                                                                                                                                                        
 1290 root      20   0 1407260  78936  30384 S   1.3  1.0   8:53.48 dockerd                                                                                                                                                                                                      
 8349 root      20   0  323536  64444  16924 S   0.0  0.8   0:53.82 node                                                                                                                                                                                                         
  827 root      20   0 1124740  54528  20400 S   0.3  0.7   1:31.97 containerd                                                                                                                                                                                                   
 8647 100       20   0   47648  36624   7176 S   1.7  0.5  11:27.03 dnsdist                                                                                                                                                                                                      
 8425 root      20   0  981612  24520   1536 S   1.7  0.3  11:40.65 docker-proxy

Looks like it went over the memory limit of a 4 GiB server. No wonder the named process got killed!

ghost commented 2 years ago

Does this mean you need 8GiB? I think it will cost you too much money... But why wouldn't the same problem happen with SG 🤔?

ragibkl commented 2 years ago

It could be that some people are using jp1 as DDOS via DNS amplification attack. Or it could be that jp users have more varied domain to query Or it could be that there are more users for jp1?

I have to consider the following.

ragibkl commented 2 years ago

I've made this change yesterday. It seems to have limited the ram usage after 1 day.

[root@li2110-118 ~]# free -m
              total        used        free      shared  buff/cache   available
Mem:           7820        2437        4296           8        1087        5113
Swap:           511           0         511

[root@li2110-118 ~]# top
top - 07:43:13 up 2 days,  5:59,  1 user,  load average: 0.12, 0.31, 0.29
Tasks: 128 total,   2 running, 126 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.6 us,  0.6 sy,  0.0 ni, 98.8 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem :  8008596 total,  4399176 free,  2495892 used,  1113528 buff/cache
KiB Swap:   524284 total,   524284 free,        0 used.  5235724 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                             
 3447 100       20   0 2062128   1.9g   4640 S   4.9 25.5 171:14.13 named                                                               
 1290 root      20   0 1407772  75044  30852 S   1.2  0.9  27:14.91 dockerd                                                             
  827 root      20   0 1124740  53860  20536 S   0.0  0.7   5:07.58 containerd                                                          
 4039 root      20   0  301700  38300  16916 S   0.0  0.5   0:01.78 node                                                                
 3515 100       20   0   47624  36736   7212 S   0.0  0.5   7:57.54 dnsdist

I'll leave it to run for 2 more days, and see the progress.

ragibkl commented 2 years ago

It seems that I'm already using the dnsdist rate limit for abuse prevention. I'll leave the current settings for now.

ragibkl commented 2 years ago

Today's usage:

[root@li2110-118 ~]# free -m
              total        used        free      shared  buff/cache   available
Mem:           7820        2660        4046          16        1113        4880
Swap:           511           0         511
[root@li2110-118 ~]# top 

top - 23:36:34 up 5 days, 21:52,  1 user,  load average: 0.38, 0.27, 0.28
Tasks: 126 total,   2 running, 124 sleeping,   0 stopped,   0 zombie
%Cpu(s):  1.5 us,  1.5 sy,  0.0 ni, 96.5 id,  0.0 wa,  0.0 hi,  0.4 si,  0.0 st
KiB Mem :  8008596 total,  4142948 free,  2725040 used,  1140608 buff/cache
KiB Swap:   524284 total,   524284 free,        0 used.  4997700 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                                                                                                                                                      
 3447 100       20   0 2289108   2.2g   4660 S  13.6 28.3 976:29.38 named                                                                                                                                                                                                        
 1290 root      20   0 1407772  75360  31044 S   0.0  0.9  83:03.83 dockerd                                                                                                                                                                                                      
  827 root      20   0 1124740  54484  20656 S   1.5  0.7  17:49.37 containerd                                                                                                                                                                                                   
 4039 root      20   0  303268  40140  17008 S   0.0  0.5   0:10.42 node                                                                                                                                                                                                         
 3515 100       20   0   49192  37840   7292 S   0.0  0.5  43:53.48 dnsdist    

It seems quite stable now, so I think I'll downsize the server back to 4GiB

ragibkl commented 2 years ago

Done. Let's monitor for a few more days.

ragibkl commented 2 years ago

I'm rolling out another change to deal with this. I hope that will make it more stable.

ragibkl commented 2 years ago

Current usage:

[root@li2110-118 default]# free -m
              total        used        free      shared  buff/cache   available
Mem:           3789        2226         993           4         569        1338
Swap:           511          87         424

[root@li2110-118 default]# top
top - 02:32:46 up 1 day,  2:48,  1 user,  load average: 0.46, 0.32, 0.26
Tasks: 113 total,   2 running, 111 sleeping,   0 stopped,   0 zombie
%Cpu(s):  4.2 us,  3.2 sy,  0.0 ni, 92.6 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem :  3880132 total,  1016228 free,  2280692 used,   583212 buff/cache
KiB Swap:   524284 total,   434232 free,    90052 used.  1369468 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                                                                                                
14315 100       20   0 1926404   1.8g   4628 S  12.2 49.3  10:19.29 named                                                                                                                                                  
13695 root      20   0  341144  79468  16988 S   0.0  2.0   0:11.66 node                                                                                                                                                   
13891 100       20   0   47136  35932   7136 S   0.0  0.9   0:48.34 dnsdist
ragibkl commented 2 years ago

Today:

[root@li2110-118 ~]# free -m
              total        used        free      shared  buff/cache   available
Mem:           3789        2465        1000          12         323        1090
Swap:           511         171         340

[root@li2110-118 ~]# top
top - 00:40:00 up 6 days, 56 min,  1 user,  load average: 0.24, 0.25, 0.21
Tasks: 112 total,   1 running, 111 sleeping,   0 stopped,   0 zombie
%Cpu(s):  3.7 us,  3.8 sy,  0.0 ni, 91.5 id,  0.0 wa,  0.0 hi,  1.0 si,  0.0 st
KiB Mem :  3880132 total,  1019888 free,  2526336 used,   333908 buff/cache
KiB Swap:   524284 total,   348428 free,   175856 used.  1115412 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                                                                    
14315 100       20   0 2239428   2.1g   2180 S  13.0 56.6 696:39.97 named                                                                                                                      
13695 root      20   0  325796  44468   2248 S   0.0  1.1   2:09.81 node                                                                                                                       
13891 100       20   0   48580  30796   2248 S   0.3  0.8  32:27.36 dnsdist                                                                                                                    
 1020 root      20   0 1349984  26648   4824 S   0.3  0.7  42:42.08 dockerd                                                                                                                    
14205 root      20   0  899172  12952    596 S   0.7  0.3  32:05.91 docker-proxy  
ragibkl commented 2 years ago

It's been a week, I think it's pretty stable now. We can probably close this issue soon.

ragibkl commented 2 years ago

Closing this issue. Please open a separate ticket if issue persists.