Open flenoir opened 1 year ago
Pinging code owners for receiver/hostmetrics: @dmitryax. See Adding Labels via Comments if you do not have permissions to add labels yourself.
Relates to/duplicates #18923 #18232
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers
. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.
Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself.
There are two additional options for process scraper in v0.75.0
mute_process_exe_error: <true|false>
mute_process_io_error: <true|false>
you can use them to mute these errors. This version also allows scraping all processes without dropping processes it could not get exe from.
So I think it can be closed @dmitryax
@jskiba These two options are not working for me. I am using go.opentelemetry.io/collector/receiver@v0.81.0/.
Please let me know if you need any inputs from my end. I see lots of errors like below
Error scraping metrics {"kind": "receiver", "name": "hostmetrics/linux/localhost", "data_type": "metrics", "error": "error reading open file descriptor count for process \"systemd\" (pid 1): open /proc/1/fd: permission denied; error reading pending signals for process \"systemd\" (pid 1): open /proc/1/fd: permission denied; error reading open file descriptor count for process \"kthreadd\" (pid 2): open /proc/2/fd: permission denied; error reading pending signals for process \"kthreadd\" (pid 2): open /proc/2/fd: permission denied; error reading open file descriptor count for process \"kworker/0:0H\" (pid 4): open /proc/4/fd: permission denied; error reading pending signals for process \"kworker/0:0H\" (pid 4): open /proc/4/fd: permission denied; error reading open file descriptor count for process \"ksoftirqd/0\" (pid 6): open /proc/6/fd: permission denied; error reading pending signals for process \"ksoftirqd/0\"
@OmprakashPaliwal is correct, when you run collector as non-root and enable one of the optional metrics process.open_file_descriptors
or process.signals_pending
, you get a permission error from the collector process trying to read /proc/[pid]/fd
files for processes that are not owned by the user running the collector. As a result, those two metrics are only generated for the processes that are owned by the user running the collector.
The solution is to give the collector process read access to files in /proc/[pid]/fd
directories. Unfortunately, regular Linux file permission settings of don't seem to work on files in the /proc
directory.
The only way I was able to fix it (other than running the collector as root, which also fixes this issue) is to add the CAP_DAC_READ_SEARCH
Linux capability on the collector binary with:
sudo setcap 'cap_dac_read_search=ep' /path/to/the/collector/binary
⚠️ Warning: This capability this gives the collector binary the ability to read any file on the filesystem. See here for examples to exploit this: https://book.hacktricks.xyz/linux-hardening/privilege-escalation/linux-capabilities#cap_dac_read_search
Thanks @mx-psi, I closed this accidentally by merging that PR.
To close this issue, I believe we need to make it possible to mute the errors that occur when scraping the process.open_file_descriptors
or process.signals_pending
metric.
One way to do this is to add another mute_...
configuration property to the scraper. There are already three available, and I'm not sure if adding a fourth is a good idea. Also, I'm not sure how it should be named. Should we have separate options for each metric name - mute_open_file_descriptors_error
and mute_signals_pending_error
?
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers
. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.
Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself.
On Kubernetes, you'd add the DAC_READ_SEARCH
capability flag in the security_context.capabilities
. I haven't verified that it resolves the issue though as the workload permissions in my env don't permit that capability flag (for good reasons).
I've tried CAP_DAC_OVERRIDE
and it doesn't seem sufficient.
But this is also something the collector should tolerate gracefully. There's no point flooding the log with repeated, predictable errors. The current mute error flags are insufficient, if it's to be simply muted.
There's no point flooding the log with repeated, predictable errors.
This is a problem. journalctl -f -u otelcol-contrib
is flooded with these errors. Every 5 seconds 182 lines are written, all of the format error reading disk usage for process "<process-name>" (pid <id>): open /proc/<id>/io: permission denied;
Describe the bug I want to get process metrics of a linux station. So i'm using a collector as an agent with "hostmetrics". When launching the service, i get errors on "process" scraping. the message returns permission denied error for all PIDs.
Steps to reproduce
Being root on the ubuntu system Download v0.74.0 of the contrib collector deb file (otel-contrib-collector_0.74.0_amd64.deb) Install contrib collector: dpkg --install otel-contrib-collector_0.74.0_amd64.deb Configure it to collect host metrics (specifically, process data) via the hostmetrics receiver and process scraper
What did you expect to see? No errors
What did you see instead? Every minute, an error message is generated complaining about error reading process name ... permission denied for seemingly every PID on the machine:
error reading process name for pid 1165232: readlink /proc/1165232/exe: permission denied; error reading process name for pid 1165265: readlink /proc/1165265/exe: permission denied; error reading process name for pid 1166088: readlink /proc/1166088/exe: permission denied; error reading process name for pid 1166634: readlink /proc/1166634/exe: permission denied; error reading process name for pid 1166826: readlink /proc/1166826/exe: permission denied; error reading process name for pid 1166827: readlink /proc/1166827/exe: permission denied; error reading process name for pid 1166874: readlink /proc/1166874/exe: permission denied; error reading process name for pid 1168213: readlink /proc/1168213/exe: permission denied; error reading process name for pid 1168214: readlink /proc/1168214/exe: permission denied; error reading process name for pid 1168221: readlink /proc/1168221/exe: permission denied; error reading process name for pid 1168222: readlink /proc/1168222/exe: permission denied", "scraper": "process"}
What version did you use? v0.74.0 of the contrib collector (https://github.com/open-telemetry/opentelemetry-collector-releases/releases/download/v0.74.0/otelcol-contrib_0.74.0_linux_amd64.deb)
What config did you use? config.yaml
service file
If i add a "sudo" in exec start, or if a chnage User to "root", error changes to :
1163933: readlink /proc/1163933/exe: no such file or directory; error reading process name for pid 1163935: readlink /proc/1163935/exe: no such file or directory; error reading username for process \"gjs\" (pid 1163938): user: unknown userid 1472934163; error reading process name for pid 1164151: readlink /proc/1164151/exe: no such file or directory; error reading process name for pid 1164366: readlink /proc/1164366/exe: no such file or directory; error reading username for process \"brave\" (pid 1165232): user: unknown userid 1472934163; error reading process name for pid 1165263: readlink /proc/1165263/exe: no such file or directory; error reading process name for pid 1165265: readlink /proc/1165265/exe: no such file or directory; error reading username for process \"sudo\" (pid 1166027): user: unknown userid 1472934163; error reading username for process \"grep\" (pid 1166028): user: unknown userid 1472934163; error reading username for process \"sudo\" (pid 1166035): user: unknown userid 1472934163; error reading process name for pid 1166088: readlink /proc/1166088/exe: no such file or directory", "scraper": "process"}
Environment OS: Ubuntu 22.04
Additional context N/A
I also have to mention that i found a closed similar issue which didn't helped me to resolve the problem