Closed pcecot closed 1 year ago
I tried reproducing with the most recent Helm chart, your values.yaml and configuration, but I got no error messages, and when I port-forwarded the pod and hit /api/v0/component/prometheus.exporter.unix/metrics
I could see node_exporter
metrics normally.
When you kubectl exec
into the pods, can you see the /host/{sys/proc/root}
mounted correctly? Is it possible that you're eg. running Kubernetes on Windows nodes?
@tpaschalis thank you for checking this. It is a k8 cluster running on Ubuntu.
Mounts looks fine:
root@grafana-agent-flow-bcnvb:/# ls /host/
proc root sys
When I port forward, I do see some metrics but not the ones from the underlying host (ubuntu). In fact focusing on ethtool they are empty:
...
# TYPE node_scrape_collector_duration_seconds gauge
node_scrape_collector_duration_seconds{collector="cpu"} 0.000476708
node_scrape_collector_duration_seconds{collector="diskstats"} 0.000235677
node_scrape_collector_duration_seconds{collector="ethtool"} 0.005354028
node_scrape_collector_duration_seconds{collector="mountstats"} 0.001216449
node_scrape_collector_duration_seconds{collector="systemd"} 8.7709e-05
# HELP node_scrape_collector_success node_exporter: Whether a collector succeeded.
# TYPE node_scrape_collector_success gauge
node_scrape_collector_success{collector="cpu"} 1
node_scrape_collector_success{collector="diskstats"} 1
node_scrape_collector_success{collector="ethtool"} 1
node_scrape_collector_success{collector="mountstats"} 1
node_scrape_collector_success{collector="systemd"} 0
# HELP promhttp_metric_handler_errors_total Total number of internal errors encountered by the promhttp metric handler.
# TYPE promhttp_metric_handler_errors_total counter
promhttp_metric_handler_errors_total{cause="encoding"} 0
promhttp_metric_handler_errors_total{cause="gathering"} 0
As an example with ethtool seems that the agent detects what interfaces it should scrape, but when it does it ends up with:
collector=ethtool msg="ethtool stats error" err="no such device" device=ens192 errno=19
Can you please confirm that you can see metrics from the underlying OS on which k8 node is running? For example node_ethtool_ucast_bytes_transmitted
:
# HELP node_ethtool_ucast_bytes_transmitted Network interface ucast bytes tx
# TYPE node_ethtool_ucast_bytes_transmitted untyped
node_ethtool_ucast_bytes_transmitted{device="ens192"} 3.3931913632e+10
ts=2023-04-06T08:12:50.628942484Z level=error component=prometheus.exporter.unix msg="collector failed" name=systemd duration_seconds=0.000108774 err="couldn't get dbus connection: dial unix /run/systemd/private: connect: no such file or directory"
Based on the error message here, it looks like /run/systemd needs to be mounted from the host into the container. That would be a requirement we weren't aware of, since the documentation for node_exporter doesn't mention it either 🤔
I have checked this further and this could be related to the following node_exporter setting which in flow mode is started with default setting being:
--path.udev.data="/run/udev/data" udev data path.
I think we should be starting it with the following:
-path.udev.data=/host/root/run/udev/data
.
Can we add a new argument udev_path
so that we can override the default?
Ah, thanks for investigating.
Can we add a new argument udev_path so that we can override the default?
Yeah, this sounds like a reasonable addition to work around the issue here 👍
Discussed in https://github.com/grafana/agent/discussions/3470
I went back to test it again on latest agent/helm chart version and the problem is still seen. When using grafana-agent in flow mode unix exporter is unable to scrape host metrics. At the same time running a standalone node exporter on the same k8 cluster scrapes metrics just fine.