Open florianl opened 1 month ago
To quantify this change is hard, as it is hard to have a controlled environment, where network connections do not open/close all the time. For this reason, I think, the GetConnections
benchmark got removed with https://github.com/shirou/gopsutil/commit/5cd488ff30acd3c2b742a5bb2ee7304fe1567443.
But it is also important to get a feeling for the impact of this proposed change. In a ec2 instance, I did run the following program compiled once on shirou/gopsutil@v4.24.5 and compared it with this proposed change.
In the first picture the program is using shirou/gopsutil@v4.24.5. Here one can see the impact of the repeated call of the read syscall.
The following picture shows the impact of this proposed change. There are less read syscalls and instead the newly introduced netlink syscall to the NETLINK_SOCK_DIAG is noticeable.
Both pictures are generated with continuous on-CPU profiling. So consequently, what is not shown in these pictures is the off-CPU part. As the original implementation is reading and processing files to extract network connection, the waiting on i/o is not visualized in both pictures. As the proposed change is less i/o intensive, garbage collection is also reduced with the proposed change.
@shirou I would be interested on your feedback.
Thank you very much for the excellent PR. Netlink support has been a concern for over 5 years or more. However, when we tried it in #809, the issue seemed to be with obtaining the PID and FD, rather than the change to netlink. But it was so long ago that I don't remember very well... And perhaps various things have changed with the recent versions of Go.
Therefore, I have a few requests:
Could you please provide benchmarks?
[!NOTE] The number of network connections and processes changed during the benchmark. Therefore, the results are not reliable. The benchmark does also not reflect the positive impact of the proposed change on the garbage collector, when running in a real world application or the reduced number of syscalls.
10 iterations of the benchmark with current master did take 39.989s
while 10 iterations of the benchmark of the proposed change took 34.226s
.
benchstat old.txt new.txt
name old time/op new time/op delta
GetConnectionsInet-16 23.5ms ±15% 44.3ms ±18% +88.73% (p=0.000 n=10+10)
GetConnectionsAll-16 26.5ms ± 7% 48.3ms ± 8% +82.23% (p=0.000 n=10+10)
name old alloc/op new alloc/op delta
GetConnectionsInet-16 1.91MB ± 1% 1.88MB ± 1% -1.79% (p=0.000 n=10+10)
GetConnectionsAll-16 2.81MB ± 1% 3.35MB ± 1% +19.21% (p=0.000 n=9+10)
name old allocs/op new allocs/op delta
GetConnectionsInet-16 32.5k ± 1% 32.5k ± 1% ~ (p=0.516 n=10+10)
GetConnectionsAll-16 36.9k ± 0% 51.6k ± 0% +40.01% (p=0.000 n=9+10)
As mentioned in https://github.com/shirou/gopsutil/pull/1660#issuecomment-2156011682 providing a reliable benchmark for this change is hard.
You are adding your library https://github.com/florianl/go-diag as a new import, but it seems it has many dependencies. Wouldn't https://github.com/elastic/gosigar/ or https://github.com/vishvananda/netlink/ have fewer dependencies for this PR purpose?
How does the number of dependencies impact performance?
florianl/go-diag focuses on a proper implementation of NETLINK_SOCK_DIAG
. To me, it looks like neither elastic/go-sigar nor vishvananda/netlink provide the required NETLINK_SOCK_DIAG
implementation - both packages try to cover some parts of the netlink API, but looking at their issue trackers there is place for improvement.
I did not implement NETLINK_SOCK_DIAG
within shirou/gopsutil , as I'm also interested in using NETLINK_SOCK_DIAG
in a different project and adding the complete NETLINK_SOCK_DIAG
API didn't sound like the correct approach to me, as there is no non Linux equivalent.
[..] the issue seemed to be with obtaining the PID and FD
As can be seen in the flamegraphs in https://github.com/shirou/gopsutil/pull/1660#issuecomment-2156011682 walking /proc
to fetch PID/FD information is unchanged.
On systems processes and network connections start and stop all the time. Using NETLINK_SOCK_DIAG
avoids the race conditions, that happen while walking /proc
to fetch inet
, inet6
or unix
information by getting a direct snapshot from the kernel at once.
Thank you for the benchmark. However, from this, it appears that the time increased by 88%, memory usage for Inet slightly decreased but increased for All, and allocations increased by 40%. This suggests that performance seems to have worsened. I understand that accurate measurement is difficult, but is this result accurate?
How does the number of dependencies impact performance?
Increasing unnecessary dependencies will increase the final build time and binary size. Moreover, gopsutil is a library, meaning it is used by various other applications. If gopsutil depends on many other libraries, it makes troubleshooting more difficult for application developers. Therefore, I would like to minimize the number of dependencies as much as possible.
To retrieve network information use NETLINK_SOCK_DIAG instead of walking
/proc/<PID>
and parsing files. Consequently this reduces the number of syscalls, as walking/proc/<PID>
and opening, reading and closing files is no longer required. But it also reduces the memory footprint as reading files into memory for processing is no longer required.Related issues:
Supersedes https://github.com/shirou/gopsutil/pull/809