shirou / gopsutil

psutil for golang
Other
10.56k stars 1.58k forks source link

`net.Connections("all")` causes high CPU usage on server with 120K connections #784

Open jimis opened 4 years ago

jimis commented 4 years ago

telegraf-1.12.3 with data collection interval 10s, sending data to influxDB OS: Linux x86_64

Opening this issue since gopsutil was found to be the culprit for https://github.com/influxdata/telegraf/issues/6559

jimis commented 4 years ago

I believe that telegraf only does net.Connections("all"), as seen in telegraf's netstat plugin ps.go.

I was expecting some slow parsing of /proc/net/tcp6 (like netstat does), but it seems that gopsutil goes through /proc/*/fd/*, which would explain the slowness, as seen in gopsutil's net_linux.go.

Lomanic commented 4 years ago

This will be fixed in #695.

jimis commented 4 years ago

@Lomanic this is not related to using netlink or not. In fact cat /proc/net/tcp6 | wc -l takes only half second with 120K connections. The problem here is the complicated logic when doing net.Connections("all"), that walks through all PIDs, even though it doesn't need to.

https://github.com/shirou/gopsutil/blob/6a8ab0308ed6ddb7a871c5bb14ed4a5f3d79ac55/net/net_linux.go#L409-L430

If pid == 0 then why do you need to getProcInodesAll() which traverses all of /proc/*/fd/*? Can't you just go through all of /proc/net/tcp* ? I believe you will be missing the pid information then, but is it really worth the overhead?

Lomanic commented 4 years ago

So, this is similar to what's talked about in #771 and not related to #695 indeed.

jimis commented 4 years ago

771 is talking about getting rid of getUids() which parses /proc/*/status. This is good but different. I am talking about avoiding iterating over all processes, but just open one file: /proc/net/tcp (or whatever protocol you want). BTW that file contains UIDs anyway.

Lomanic commented 4 years ago

OK, thanks, sorry for misunderstanding, I'm not very familiar with this part of the code.

danielnelson commented 4 years ago

Since the []ConnectionStat type includes the pid, I believe we do need to walk the pid section of procfs in order to fill this field. This appears to be how netstat and even ss works when using the -p flag.

jimis commented 4 years ago

@danielnelson I agree that the PID can not be traced just with /proc/net/tcp*. So we can't really avoid parsing the whole hierarchy, because that would break the current API. Unless we add a parameter, or even a new API call, that would exclude the PID from the returned data.

florianl commented 4 months ago

In https://github.com/shirou/gopsutil/pull/1660 a proper netlink based implementation for net.Connections() is proposed. I'm happy to get your feedback on this proposed change.