Open jimis opened 4 years ago
I believe that telegraf only does net.Connections("all")
, as seen in telegraf's netstat plugin ps.go.
I was expecting some slow parsing of /proc/net/tcp6
(like netstat does), but it seems that gopsutil goes through /proc/*/fd/*
, which would explain the slowness, as seen in gopsutil's net_linux.go.
This will be fixed in #695.
@Lomanic this is not related to using netlink or not. In fact cat /proc/net/tcp6 | wc -l
takes only half second with 120K connections. The problem here is the complicated logic when doing net.Connections("all")
, that walks through all PIDs, even though it doesn't need to.
If pid == 0
then why do you need to getProcInodesAll()
which traverses all of /proc/*/fd/*
? Can't you just go through all of /proc/net/tcp*
? I believe you will be missing the pid
information then, but is it really worth the overhead?
So, this is similar to what's talked about in #771 and not related to #695 indeed.
getUids()
which parses /proc/*/status
. This is good but different. I am talking about avoiding iterating over all processes, but just open one file: /proc/net/tcp
(or whatever protocol you want). BTW that file contains UIDs anyway.OK, thanks, sorry for misunderstanding, I'm not very familiar with this part of the code.
Since the []ConnectionStat
type includes the pid, I believe we do need to walk the pid section of procfs in order to fill this field. This appears to be how netstat and even ss works when using the -p
flag.
@danielnelson I agree that the PID can not be traced just with /proc/net/tcp*
. So we can't really avoid parsing the whole hierarchy, because that would break the current API. Unless we add a parameter, or even a new API call, that would exclude the PID from the returned data.
In https://github.com/shirou/gopsutil/pull/1660 a proper netlink based implementation for net.Connections()
is proposed. I'm happy to get your feedback on this proposed change.
telegraf-1.12.3 with data collection interval 10s, sending data to influxDB OS: Linux x86_64
Opening this issue since gopsutil was found to be the culprit for https://github.com/influxdata/telegraf/issues/6559