Closed eric1234 closed 4 months ago
As I understand it, your argument is as follows.
The ProcsTotal
of MiscStat
returned by load.Misc()
is supposed to return the number of Processes that do not include Threads
by name, but actually returns a value that does include Threads
. This is because /proc/loadavg
is read in load.Misc()
, and the value of this loadavg contains threads.
After reading man proc
, I agree with you. However, it seems to me that the only way to get a pure process count that does not include threads is to count /proc
as you write, or call the ps
command. Is there any other way?
I will try to find a better way, but if you know of any, I would appreciate it if you could let me know.
Thank you.
However, it seems to me that the only way to get a pure process count that does not include threads is to count
/proc
as you write, or call the ps command. Is there any other way?
I don't think so. I also searched through the man page as well as did some general web searches and the general consensus is to count /proc
as gopsutil is already doing with readPidsFromDir
.
OpenTelemetry expressed concern about this strategy being accurate due to changing processes while the count is going on. This initially made me think this wasn't a viable strategy.
But it's also trying to collect status information on the pids as it's going. If we are just trying to get a process count then it's really just the one call to Readdirnames
which if we dig down into go seems to be just a single syscall also so I think in this context it would be accurate.
I would be open to putting together a PR that refactors load.Misc
to instead using the existing readPidsFromDir
instead of the loadavg file. This should mean gopsutil provides the same interface externally but will report an accurate total process count. Let me know if that sounds good and I'll work on it.
Great to hear that! There does not seem to be a way to get the exact value in the proc file system, using readPidsFromDir
seems like a good way to get number of processes.
I think it would be better to move readPidsFromDir()
to internal/common/common_linux.go
and call it from the process
and load
packages. If you could create a PR, it would be greatly appreciated. Thank you!
Set down to implement this and it looks like there is already a different method in common_linux.go
that counts the number of processes by reading the directories entries that look like a number. NumberProcWithContext
is very similar to the readPidsFromDir
. I think the only main difference is the former is based around a context
while the later is based around a path
.
I'm going to work on making getProcsTotal
read from the existing NumberProcWithContext
already in common. That will fix my issue. Then as a separate bit of work if we wanted to DRY up NumberProcWithContext
and readPidsFromDir
that could be done.
Describe the bug
PR #682 added the ability to get the total number of processes via the
MiscStats
struct. It gets this information by reading the/proc/loadavg
file which looks like:The proc man page describes that file as:
The bolded text is the key part. In my example above, it's showing 1095 as the total number of kernel scheduled entities. But note it uses that term instead of "processes". It even indicates in parentheses that "kernel scheduled entities" is both processes and threads. But by the name
ProcTotal
and the fact that all other numbers inMiscStat
are processes only (not threads) this is implying I have 1095 processes when I do not. Thread count varies greatly. If I run it just a few minutes later it's showing 1127:Also if we actually count the processes in my system we find we only have a bit over 300:
Expected behavior
I would expect
ProcsTotal
to return the total number of processes not processes and threads. I don't know if there is something else that can be read in/proc
that gives just the total number of processes (I couldn't find anything) or if the attribute needs to be deprecated and removed for being in-accurate.I came across this due to OpenTelemtry using this struct to help measure the number of processes in the system and getting wildly high numbers. It actually calculates this number itself by reading all directories under
/proc
that are number. But then it only uses that as a starting place and offsets it with what this struct is providing saying:Based on that I don't think the OpenTelemetry strategy will be sufficient to get an accurate process count.
Environment (please complete the following information):
/etc/os-release
and the result ofuname -a
]