gravitl / netclient

Apache License 2.0
67 stars 32 forks source link

`netclient daemon` high CPU and RAM usage. #875

Open Fijxu opened 2 months ago

Fijxu commented 2 months ago

When using netclient daemon or the systemd service, the CPU and RAM usage increases until the process is killed. Sadly I have no clue why this happens. I already tried and checked a few things:

netclient daemon works perfectly on my 5 machines, no CPU usage nor high RAM usage; my laptop is the only machine with this problem.

"verbosity": 3 and "debug": true logs:

[netclient] 2024-09-04 19:19:25 lock try 
[netclient] 2024-09-04 19:19:25 unlock try 
[netclient] 2024-09-04 19:19:25 lockfile exists 
[netclient] 2024-09-04 19:19:25 removed lockfile 
[netclient] 2024-09-04 19:19:25 lock try 
[netclient] 2024-09-04 19:19:25 unlock try 
[netclient] 2024-09-04 19:19:25 lockfile exists 
[netclient] 2024-09-04 19:19:25 removed lockfile 
{"time":"2024-09-04T19:19:25.097478045-04:00","level":"INFO","source":"root.go 284}","msg":"OS is","os":"linux"}
[netclient] 2024-09-04 19:19:25 lock try 
[netclient] 2024-09-04 19:19:25 unlock try 
[netclient] 2024-09-04 19:19:25 lockfile exists 
[netclient] 2024-09-04 19:19:25 removed lockfile 
[netclient] 2024-09-04 19:19:25 lock try 
[netclient] 2024-09-04 19:19:25 unlock try 
[netclient] 2024-09-04 19:19:25 lockfile exists 
[netclient] 2024-09-04 19:19:25 removed lockfile 
daemon called
{"time":"2024-09-04T19:19:25.175086487-04:00","level":"INFO","source":"daemon.go 52}","msg":"starting netclient daemon","version":"v0.25.0"}
[netclient] 2024-09-04 19:19:25 lock try 
[netclient] 2024-09-04 19:19:25 Starting firewall... 
[netclient] 2024-09-04 19:19:25 iptables is supported 
[netclient] 2024-09-04 19:19:25 adding forwarding rule 
[netclient] 2024-09-04 19:19:25 lock try 
[netclient] 2024-09-04 19:19:25 unlock try 
[netclient] 2024-09-04 19:19:25 lockfile exists 
[netclient] 2024-09-04 19:19:25 removed lockfile 
[netclient] 2024-09-04 19:19:25 lock try 
[netclient] 2024-09-04 19:19:25 unlock try 
[netclient] 2024-09-04 19:19:25 lockfile exists 
[netclient] 2024-09-04 19:19:25 removed lockfile 
{"time":"2024-09-04T19:19:27.190699017-04:00","level":"INFO","source":"daemon.go 174}","msg":"wireguard public listen port: ","port":47274}
[netclient] 2024-09-04 19:19:27 failed to dial:  dial udp :51821->[2001:4860:4864:5:8000::1]:19302: connect: network is unreachable 
[netclient] 2024-09-04 19:19:27 stun transaction failed:  stun1.l.google.com dial udp :51821->[2001:4860:4864:5:8000::1]:19302: connect: network is unreachable 
{"time":"2024-09-04T19:19:27.201887885-04:00","level":"WARN","source":"stun.go 73}","msg":"callHolePunch udp6 error","!BADKEY":"dial udp :51821->[2001:4860:4864:5:8000::1]:19302: connect: network is unreachable"}
[netclient] 2024-09-04 19:19:27 failed to dial:  dial udp :51821->[2001:4860:4864:5:8000::1]:19302: connect: network is unreachable 
[netclient] 2024-09-04 19:19:27 stun transaction failed:  stun2.l.google.com dial udp :51821->[2001:4860:4864:5:8000::1]:19302: connect: network is unreachable 
{"time":"2024-09-04T19:19:27.204398684-04:00","level":"WARN","source":"stun.go 73}","msg":"callHolePunch udp6 error","!BADKEY":"dial udp :51821->[2001:4860:4864:5:8000::1]:19302: connect: network is unreachable"}
[netclient] 2024-09-04 19:19:27 failed to dial:  dial udp :51821->[2001:4860:4864:5:8000::1]:19302: connect: network is unreachable 
[netclient] 2024-09-04 19:19:27 stun transaction failed:  stun3.l.google.com dial udp :51821->[2001:4860:4864:5:8000::1]:19302: connect: network is unreachable 
{"time":"2024-09-04T19:19:27.214849131-04:00","level":"WARN","source":"stun.go 73}","msg":"callHolePunch udp6 error","!BADKEY":"dial udp :51821->[2001:4860:4864:5:8000::1]:19302: connect: network is unreachable"}
[netclient] 2024-09-04 19:19:27 failed to dial:  dial udp :51821->[2001:4860:4864:5:8000::1]:19302: connect: network is unreachable 
[netclient] 2024-09-04 19:19:27 stun transaction failed:  stun4.l.google.com dial udp :51821->[2001:4860:4864:5:8000::1]:19302: connect: network is unreachable 
{"time":"2024-09-04T19:19:27.217370984-04:00","level":"WARN","source":"stun.go 73}","msg":"callHolePunch udp6 error","!BADKEY":"dial udp :51821->[2001:4860:4864:5:8000::1]:19302: connect: network is unreachable"}
{"time":"2024-09-04T19:19:27.462334267-04:00","level":"WARN","source":"daemon.go 542}","msg":"failed to get publicIP","error":"no IP could be found"}
{"time":"2024-09-04T19:19:27.462489527-04:00","level":"WARN","source":"daemon.go 198}","msg":"GetPublicIPv6 Warn: ","Warn":"no ipv6 found"}
[netclient] 2024-09-04 19:19:27 lock try 
[netclient] 2024-09-04 19:19:27 unlock try 
[netclient] 2024-09-04 19:19:27 lockfile exists 
[netclient] 2024-09-04 19:19:27 removed lockfile 
{"time":"2024-09-04T19:19:27.471992366-04:00","level":"INFO","source":"daemon.go 217}","msg":"configuring netmaker wireguard interface"}
[netclient] 2024-09-04 19:19:28 Logging verbosity updated to 3 
completed pull for server nm.nadeko.net
[netclient] 2024-09-04 19:19:28 adding addresses to netmaker interface 
{"time":"2024-09-04T19:19:28.269554882-04:00","level":"INFO","source":"wireguard_linux.go 145}","msg":"adding address","address":"100.64.0.6","network":"100.64.0.0/24"}
{"time":"2024-09-04T19:19:28.269791443-04:00","level":"INFO","source":"wireguard_linux.go 145}","msg":"adding address","address":"100.0.0.6","network":"100.0.0.0/24"}
{"time":"2024-09-04T19:19:28.293537928-04:00","level":"INFO","source":"daemon.go 283}","msg":"netclient message queue started for server:","server":"nm.nadeko.net"}
[netclient] 2024-09-04 19:19:28 initialized endpoint detection on port 51821 
[netclient] 2024-09-04 19:19:28 initialized endpoint detection on port 51821 
{"time":"2024-09-04T19:19:29.067164746-04:00","level":"INFO","source":"daemon.go 317}","msg":"mqtt connect handler"}
{"time":"2024-09-04T19:19:29.229578314-04:00","level":"INFO","source":"daemon.go 351}","msg":"successfully requested ACK on server","server":"nm.nadeko.net"}
{"time":"2024-09-04T19:19:29.229678794-04:00","level":"INFO","source":"daemon.go 434}","msg":"subscribed to updates for node","node":"fb940ca1-da73-4d55-a963-4cccc294f5ff","network":"everything"}
{"time":"2024-09-04T19:19:29.398934452-04:00","level":"INFO","source":"daemon.go 434}","msg":"subscribed to updates for node","node":"e23a2cfe-b63d-4541-9e79-ae5938bbe23a","network":"everything-real"}
{"time":"2024-09-04T19:19:29.399044933-04:00","level":"INFO","source":"daemon.go 410}","msg":"subscribing to host updates for","host":"f3f2eb74-661c-4f85-bd13-5ac1dc93282b","server":"nm.nadeko.net"}
{"time":"2024-09-04T19:19:29.568749297-04:00","level":"INFO","source":"daemon.go 415}","msg":"subscribing to host updates for","host":"f3f2eb74-661c-4f85-bd13-5ac1dc93282b","server":"nm.nadeko.net"}
{"time":"2024-09-04T19:19:29.625093341-04:00","level":"INFO","source":"mqhandlers.go 125}","msg":"processing peer update for server","server":"nm.nadeko.net"}
[netclient] 2024-09-04 19:19:32 determined new endpoint for peer 27iHJoszk1o6J7ZYI9ffHVQGF00uEaqOaJ7VG8cpjg0= - 192.168.1.2:51821 
[netclient] 2024-09-04 19:19:32 determined new endpoint for peer NcN3FxB+eBgHoWCu2x5dHtni+9IVAiOXGfG81N2ASk8= - 192.168.1.3:51821 
[netclient] 2024-09-04 19:19:33 determined new endpoint for peer NcN3FxB+eBgHoWCu2x5dHtni+9IVAiOXGfG81N2ASk8= - 192.168.1.3:51821 
[netclient] 2024-09-04 19:19:33 determined new endpoint for peer 27iHJoszk1o6J7ZYI9ffHVQGF00uEaqOaJ7VG8cpjg0= - 192.168.1.2:51821 

The high CPU usage starts at [netclient] YYYY-MM-DD HH:MM:SS adding forwarding rule line

image

I can give more debug information if I can get some insight on how to do it. Thanks!

Fijxu commented 1 month ago

After profiling netclient with pprof. The heap usage showed that the func ParseHosts was using a lot of memory. The main reason what that my hosts file has more than 400K entries (blocked domains).

https://github.com/gravitl/txeh/blob/3778c58bd69f81aa845bda0d5149532d6bc5e0bb/txeh.go#L351

$ wc -l /etc/hosts                                       
451343 /etc/hosts

After deleting all entries and leaving only the essential ones, netclient no longer uses a huge amount of CPU and memory. This is still an issue, so if you guys are going to address it, keep this open. Does netclient really needs to parse all the entries on the hosts file?

yabinma commented 1 month ago

Wow 400K entries in hosts file, never thought about it before. hosts file was used in old versions. It was abandoned already. I removed the old code in normal logic and only keep it in migrate. https://github.com/gravitl/netclient/pull/898 Please wait and verify in new release.