Processes() is extremely CPU intensive

xxxserxxx commented 4 years ago

Describe the bug Processes() is significantly more resource intensive than even forking ps with exec.Command and parsing the results.

To Reproduce Download the attached go source code (and rename to perf.go it because github is dumb) and run it on the command line with:

go run ./perf.go 500

The argument is the number of iterations; you may need to start with fewer if it runs too long.

Expected behavior I'd expect the native code to be more resource efficient than spawning a child ps process. On my machine, I consistently see CPU use of between 2x and 4x CPU use, and much slower results:

➜  gopsutil go run perf.go 500
Forking ##########################################
....X....X....X....X....X....X....X....X....X....
Average CPU: 802
Time: 6.463249387s
GoPsUtil #########################################
....X....X....X....X....X....X....X....X....X....
Average CPU: 5563
Time: 13.694322005s

Environment (please complete the following information):

[X] Linux: [paste contents of /etc/os-release and the result of uname -a]

NAME="Arch Linux"
PRETTY_NAME="Arch Linux"
ID=arch
BUILD_ID=rolling
HOME_URL="https://www.archlinux.org/"
➜  gopsutil uname -a
Linux glamdring 5.5.4-arch1-1 #1 SMP PREEMPT Sat, 15 Feb 2020 00:36:29 +0000 x86_64 GNU/Linux

(trimmed some attributes out)

github.com/shirou/gopsutil v2.20.1+incompatible

Edit

I noticed this while doing some development work on gotop; running both branches, for about an hour, it didn't stray off this:

2020-02-19-145553_458x236_scrot

AtakanColak commented 4 years ago

@xxxserxxx could you also try go-sysinfo? I've been wondering but not much time to try it atm.

xxxserxxx commented 4 years ago

Yeah! Thanks; this gives me some impetus to build out proper benchmarks for devices.

AtakanColak commented 4 years ago

Np, please feel free to share the results, it would be very helpful to me -and anyone needing a process data library in Go- as well.

Lomanic commented 4 years ago

I may guess that ps is more efficient than gopsutil because it knows before-hand which information to gather from /proc/$PID/status in a single file read while gopsutil first creates a Process variable (also checks for process existence as we can create this type for an arbitrary PID number, could you maybe benchmark before this change https://github.com/shirou/gopsutil/pull/821?) and then re-reads /proc/$PID/status file for each new information asked (see how many times fillFromStatusWithContext is called).

In fact, we already have the following issues that aim to optimize this: https://github.com/shirou/gopsutil/issues/286, https://github.com/shirou/gopsutil/issues/413.

xxxserxxx commented 4 years ago

I wrote a benchmark, and am attaching it here; again, github has a restricted set of suffixes, so resuffix the file to .go first.

The attached file is a proper benchmark containing all three implementations: forking, gopsutil, and go-sysinfo. Each function performs similar operations, getting some data about the processes. Optimizing-out is addressed by counting errors in the return values.

➜  ps go test -benchtime=20s -bench  .
goos: linux
goarch: amd64
pkg: ser1.net/perf
BenchmarkForking-8          2437           9663078 ns/op
--- BENCH: BenchmarkForking-8
    perf_test.go:25: forking errors: 0
    perf_test.go:25: forking errors: 0
    perf_test.go:25: forking errors: 0
BenchmarkGoPSUtil-8         1162          21010525 ns/op
--- BENCH: BenchmarkGoPSUtil-8
    perf_test.go:32: GoPsUtil errors: 0
    perf_test.go:32: GoPsUtil errors: 2
    perf_test.go:32: GoPsUtil errors: 11
BenchmarkGoSysInfo-8        2671           8590463 ns/op
--- BENCH: BenchmarkGoSysInfo-8
    perf_test.go:39: GoSysInfo errors: 0
    perf_test.go:39: GoSysInfo errors: 1
    perf_test.go:39: GoSysInfo errors: 27
PASS
ok      ser1.net/perf   74.941s

Of note is that the forking version is the only one that doesn't generate errors when processing data. What I saw of the errors, they appear to be related to processes disappearing while the libraries are gathering information about the process. go-sysinfo consistently generated more errors than gopsutil, across all runs.

I'm see vaguely conflicting results. According to the benchmark, go-sysutils is, indeed, faster than forking. However, in an application the CPU use of go-sysutils is double that of forking. I said "vaguely" because this is easily explained: forking is more IO bound (I'm guessing), which takes longer with less CPU impact than a pure syscall interface. However, given that the application isn't in a tight loop (it loops once per second), I wouldn't expect to notice the difference, and top reports higher CPU use for go-sysinfo when both are running side-by-side.

Edit

How about I go ahead and attach that benchmark, huh?

perf_test.go.txt

xxxserxxx commented 4 years ago

I meant to @AtakanColak. Also, @Lomanic -- what did you mean, "benchmark before this change?" Do you want me to fix the version at the commit prior to #821 and benchmark that?

The commit immediately prior to the merge was v2.20.1, which made things easy. I pinned the version in go.mod to that and re-ran the benchmark. It's not perfect, because I running them side-by-side in the same test would be more effort, but here are the results:

##### v2.20.1
➜  ps go test -benchtime=20s -bench  .
go: downloading github.com/shirou/gopsutil v2.20.1+incompatible
goos: linux
goarch: amd64
pkg: ser1.net/perf
BenchmarkForking-8          2430           9777497 ns/op
--- BENCH: BenchmarkForking-8
    perf_test.go:25: forking errors: 0
    perf_test.go:25: forking errors: 0
    perf_test.go:25: forking errors: 0
BenchmarkGoPSUtil-8         1186          21953451 ns/op
--- BENCH: BenchmarkGoPSUtil-8
    perf_test.go:32: GoPsUtil errors: 0
    perf_test.go:32: GoPsUtil errors: 1
    perf_test.go:32: GoPsUtil errors: 16
BenchmarkGoSysInfo-8        2703           8751875 ns/op
--- BENCH: BenchmarkGoSysInfo-8
    perf_test.go:39: GoSysInfo errors: 0
    perf_test.go:39: GoSysInfo errors: 0
    perf_test.go:39: GoSysInfo errors: 28
PASS
ok      ser1.net/perf   77.424s

The difference is within an easy margin of error, so I ran it again with just the gopsutil benchmark, switching the go.mod version in between:

##### v2.20.1
➜  ps go test -benchtime=20s -bench=BenchmarkGoPSUtil  .
goos: linux
goarch: amd64
pkg: ser1.net/perf
BenchmarkGoPSUtil-8         1164          20295012 ns/op
--- BENCH: BenchmarkGoPSUtil-8
    perf_test.go:32: GoPsUtil errors: 0
    perf_test.go:32: GoPsUtil errors: 0
    perf_test.go:32: GoPsUtil errors: 7
PASS
ok      ser1.net/perf   25.732s
##### v2.20.2
➜  ps go test -benchtime=20s -bench=BenchmarkGoPSUtil  .
goos: linux
goarch: amd64
pkg: ser1.net/perf
BenchmarkGoPSUtil-8         1131          21108605 ns/op
--- BENCH: BenchmarkGoPSUtil-8
    perf_test.go:32: GoPsUtil errors: 0
    perf_test.go:32: GoPsUtil errors: 0
    perf_test.go:32: GoPsUtil errors: 13
PASS
ok      ser1.net/perf   26.044s

xxxserxxx commented 4 years ago

Another hiccup with go-sysinfo: it isn't making syscalls; it's looking directly in /proc. This means it's unusable for (e.g.) gotop, because unless it's running as root it can't get information about many of the processes running on the system. ps can, so forking that and parsing the results can provide a system overview to non-root users that go-sysinfo can't.

Now I'm wondering how gopsutil gets it's info 🤔

AtakanColak commented 4 years ago

@xxxserxxx First of all thank you for your beautiful work. It has also given me few ideas. However, do let me know if I'm wrong, when you execute a exec.Cmd, it doesn't fork in C sense, it spawns another process. Unless you have been monitoring the CPU/Mem usage of that process as well, simply observing the CPU usage of gotop will yield inaccurate results; work is given to the ps process while gotop is parsing the results whereas parsing is extra work.

To my understanding gopsutil also looks to /proc. However I haven't come across a case where I couldn't get information about processes even though I didn't run it as root. Perhaps I simply didn't realize it..

EDIT: I have taken some look into how ps works and it supposedly also reads from /proc. I don't understand the difference now :(

xxxserxxx commented 4 years ago

You're right: I'm not tracking the ps process CPU usage at all. It enters and exits far too quickly, and I'm not able to monitor total average CPU usage granular enough to notice if it's overall higher or lower either way.

Yeah, I don't know about ps; unless it's running in some sort of limited elevated privilege mode, I can't think of why it works. It may be because it's pulling limited, permitted data from /proc -- just general CPU and memory utilization -- whereas the libs are trying to gather more restricted information like environment. The former is not (necessarily) sensitive; accessing environment variables of running processes is. But I'm totally guessing.

I do know that some errors are because processes are disappearing between the time the library is pulling a list of procs and reading data on the procs. So the handles are literally disappearing, like if you read a directory glob and something deleted files before you processed each file. Again, I haven't dug into internals, but this could be mitigated by recognizing that the handle has disappeared and just swallowing it rather than returning errors on it. I believe go-sysinfo is particularly susceptible to this because the list and stat pulls are entirely different calls by the client using the API, rather than a single call that pulls and returns everything.

That said, I'm less worried about the disappearing handles. That can be easily recognized and handled in client code. I'm more concerned about the permission errors I see. Also, I only know the permission issue is happening with go-sysinfo; I haven't looked for or observed that error in gopsutil yet.

Lomanic commented 4 years ago

what did you mean, "benchmark before this change?" Do you want me to fix the version at the commit prior to #821 and benchmark that?

Exactly, just out of curiosity, I was not expecting a spectacular difference. Sorry for this noise.

I believe go-sysinfo is particularly susceptible to this because the list and stat pulls are entirely different calls by the client using the API, rather than a single call that pulls and returns everything.

That's also the case for gopsutil, see my previous comment.

ps, go-sysinfo and gopsutil all get their data about processes from procfs on Linux. There is no special syscalls involved. Hence they should report the same errors. Maybe gopsutil and go-sysinfo don't use the exact same files to get the same data, files with different permissions then.

About the core issue reported (comparatively high CPU usage compared to ps), see my previous message. Determining how go-sysinfo is able to get this data more efficiently than gopsutil would need some profiling to check what both libraries do differently, this benchmark provides an interesting data point, thanks for this.

xxxserxxx commented 4 years ago

I've been reading back through the thread, and I just realized that it doesn't matter whether we're counting the ps process or not. The benchmarks are measuring the number of call to gather information that can be made per run; that's an absolute number. Also, a persistent 2% CPU average use, vs a 0.5s CPU use with a once-per-second possible spike because of ps (it's hard to see because it's transient and it never exists long enough to show up in process lists), is still a net loss.

I don't think we can explain away the performance because of the off-load onto a separate child process which (a) isn't persistent, and (b) is irrelevant for the benchmark.

ssoroka commented 3 years ago

We are seeing a major performance regression on linux between 2.20.4 and 2.20.5. With CPU jumping in a test case from 14% with about 280 processes in 2.20.4, to 195% in 2.20.5. Looks like there was some cache code removed which may be suspect.

Lomanic commented 3 years ago

We are seeing a major performance regression on linux between 2.20.4 and 2.20.5. With CPU jumping in a test case from 14% with about 280 processes in 2.20.4, to 195% in 2.20.5. Looks like there was some cache code removed which may be suspect.

We don't cache BootTime anymore to fix #837. We retrieve this value for each new process instantiation as a consequence to this call to CreateTime() (call to BootTime()), so we can later compare this value in a future call to IsRunning(). #945 may bring some improvements by getting uptime through the sysinfo syscall.

xxxserxxx commented 3 years ago

Here's an update from nearly a year later: I updated to v3 to see if anything changed, and it appears performance is about the same:

10642» go test -benchtime=20s -bench . 
goos: linux
goarch: amd64
pkg: ser1.net/perf
cpu: Intel(R) Core(TM) i5-8265U CPU @ 1.60GHz
BenchmarkForking-8          1028          23357252 ns/op
--- BENCH: BenchmarkForking-8
    perf_test.go:25: forking errors: 0
    perf_test.go:25: forking errors: 0
    perf_test.go:25: forking errors: 0
BenchmarkGoPSUtil-8           40         556564316 ns/op
--- BENCH: BenchmarkGoPSUtil-8
    perf_test.go:32: GoPsUtil errors: 0
    perf_test.go:32: GoPsUtil errors: 0
BenchmarkGoSysInfo-8         964          25360231 ns/op
--- BENCH: BenchmarkGoSysInfo-8
    perf_test.go:39: GoSysInfo errors: 0
    perf_test.go:39: GoSysInfo errors: 0
    perf_test.go:39: GoSysInfo errors: 7
PASS
ok      ser1.net/perf   76.207s
10643» grep shirou go.mod
        github.com/shirou/gopsutil/v3 v3.21.8

cforce commented 7 months ago

A significant performance issue has been observed in the 'gopsutil' library on Linux when comparing versions 2.20.4 and 2.20.5. In version 2.20.4, a particular test case with approximately 280 processes exhibited a CPU usage of 14%. However, in version 2.20.5, the CPU usage skyrocketed to 195%. This performance degradation can be attributed to the removal of BootTime caching in order to address a specific scenario where a device initially reports an incorrect date (e.g., epoch or a more recent date set by a tool like 'fake_hwclock') and later corrects it using NTP. This results in 'gopsutil' reporting incorrect system uptime. The removal of this cache has caused the OpenTelemetry Collector (otel collector), which relies on 'gopsutil', to repeatedly access BootTime for every process and host metric with a high frequency, leading to increased resource usage. To mitigate this issue, one potential solution is to reintroduce BootTime caching in the collector's code. This change is expected to greatly improve performance since hostmetric continually calls 'gopsutil' in an "endless loop."

You can find further details and discussions regarding this issue in the following link:

https://github.com/shirou/gopsutil/issues/1283#issuecomment-1086268660 It is suggested that 'gopsutil' may need to support caching for such high-demand use cases, and the synchronization of NTP and BootTime should occur at defined intervals or be handled differently to address this specific scenario effectively.

shirou commented 7 months ago

@cforce For now, please use the latest version, which is v3.23.12. v2 is no longer maintained.

And #1579 introduces the host.EnableBootTimeCache() function to cache boot time. As you commented, caching BootTime can cause problems, so this cache is disabled by default, and it is up to the user to turn it on. This feature will be released the end of this month.

shirou / gopsutil

Processes() is extremely CPU intensive #842

Edit

Edit