prometheus / node_exporter

Exporter for machine metrics
https://prometheus.io/
Apache License 2.0
11.3k stars 2.38k forks source link

thermal_zone collector stuck on Jetson Orin Nano #3071

Open gouthamve opened 4 months ago

gouthamve commented 4 months ago

Host operating system: output of uname -a

Linux jetson 5.15.136-tegra #1 SMP PREEMPT Wed Apr 24 19:36:48 PDT 2024 aarch64 aarch64 aarch64 GNU/Linux

node_exporter version: output of node_exporter --version

node_exporter, version 1.8.1 (branch: HEAD, revision: 400c3979931613db930ea035f39ce7b377cdbb5b)
  build user:       root@7afbff271a3f
  build date:       20240521-18:36:53
  go version:       go1.22.3
  platform:         linux/arm64
  tags:             unknown

node_exporter command line flags

./node_exporter

node_exporter log output

Expand logs ``` ts=2024-07-07T12:37:02.195Z caller=node_exporter.go:193 level=info msg="Starting node_exporter" version="(version=1.8.1, branch=HEAD, revision=400c3979931613db930ea035f39ce7b377cdbb5b)" ts=2024-07-07T12:37:02.195Z caller=node_exporter.go:194 level=info msg="Build context" build_context="(go=go1.22.3, platform=linux/arm64, user=root@7afbff271a3f, date=20240521-18:36:53, tags=unknown)" ts=2024-07-07T12:37:02.196Z caller=filesystem_common.go:111 level=info collector=filesystem msg="Parsed flag --collector.filesystem.mount-points-exclude" flag=^/(dev|proc|run/credentials/.+|sys|var/lib/docker/.+|var/lib/containers/storage/.+)($|/) ts=2024-07-07T12:37:02.197Z caller=filesystem_common.go:113 level=info collector=filesystem msg="Parsed flag --collector.filesystem.fs-types-exclude" flag=^(autofs|binfmt_misc|bpf|cgroup2?|configfs|debugfs|devpts|devtmpfs|fusectl|hugetlbfs|iso9660|mqueue|nsfs|overlay|proc|procfs|pstore|rpc_pipefs|securityfs|selinuxfs|squashfs|sysfs|tracefs)$ ts=2024-07-07T12:37:02.197Z caller=diskstats_common.go:111 level=info collector=diskstats msg="Parsed flag --collector.diskstats.device-exclude" flag=^(z?ram|loop|fd|(h|s|v|xv)d[a-z]|nvme\d+n\d+p)\d+$ ts=2024-07-07T12:37:02.198Z caller=node_exporter.go:111 level=info msg="Enabled collectors" ts=2024-07-07T12:37:02.198Z caller=node_exporter.go:118 level=info collector=arp ts=2024-07-07T12:37:02.198Z caller=node_exporter.go:118 level=info collector=bcache ts=2024-07-07T12:37:02.198Z caller=node_exporter.go:118 level=info collector=bonding ts=2024-07-07T12:37:02.198Z caller=node_exporter.go:118 level=info collector=btrfs ts=2024-07-07T12:37:02.198Z caller=node_exporter.go:118 level=info collector=conntrack ts=2024-07-07T12:37:02.198Z caller=node_exporter.go:118 level=info collector=cpu ts=2024-07-07T12:37:02.198Z caller=node_exporter.go:118 level=info collector=cpufreq ts=2024-07-07T12:37:02.198Z caller=node_exporter.go:118 level=info collector=diskstats ts=2024-07-07T12:37:02.198Z caller=node_exporter.go:118 level=info collector=dmi ts=2024-07-07T12:37:02.198Z caller=node_exporter.go:118 level=info collector=edac ts=2024-07-07T12:37:02.198Z caller=node_exporter.go:118 level=info collector=entropy ts=2024-07-07T12:37:02.198Z caller=node_exporter.go:118 level=info collector=fibrechannel ts=2024-07-07T12:37:02.198Z caller=node_exporter.go:118 level=info collector=filefd ts=2024-07-07T12:37:02.198Z caller=node_exporter.go:118 level=info collector=filesystem ts=2024-07-07T12:37:02.198Z caller=node_exporter.go:118 level=info collector=hwmon ts=2024-07-07T12:37:02.198Z caller=node_exporter.go:118 level=info collector=infiniband ts=2024-07-07T12:37:02.198Z caller=node_exporter.go:118 level=info collector=ipvs ts=2024-07-07T12:37:02.198Z caller=node_exporter.go:118 level=info collector=loadavg ts=2024-07-07T12:37:02.198Z caller=node_exporter.go:118 level=info collector=mdadm ts=2024-07-07T12:37:02.198Z caller=node_exporter.go:118 level=info collector=meminfo ts=2024-07-07T12:37:02.198Z caller=node_exporter.go:118 level=info collector=netclass ts=2024-07-07T12:37:02.198Z caller=node_exporter.go:118 level=info collector=netdev ts=2024-07-07T12:37:02.198Z caller=node_exporter.go:118 level=info collector=netstat ts=2024-07-07T12:37:02.198Z caller=node_exporter.go:118 level=info collector=nfs ts=2024-07-07T12:37:02.198Z caller=node_exporter.go:118 level=info collector=nfsd ts=2024-07-07T12:37:02.198Z caller=node_exporter.go:118 level=info collector=nvme ts=2024-07-07T12:37:02.198Z caller=node_exporter.go:118 level=info collector=os ts=2024-07-07T12:37:02.198Z caller=node_exporter.go:118 level=info collector=powersupplyclass ts=2024-07-07T12:37:02.198Z caller=node_exporter.go:118 level=info collector=pressure ts=2024-07-07T12:37:02.198Z caller=node_exporter.go:118 level=info collector=rapl ts=2024-07-07T12:37:02.198Z caller=node_exporter.go:118 level=info collector=schedstat ts=2024-07-07T12:37:02.198Z caller=node_exporter.go:118 level=info collector=selinux ts=2024-07-07T12:37:02.198Z caller=node_exporter.go:118 level=info collector=sockstat ts=2024-07-07T12:37:02.198Z caller=node_exporter.go:118 level=info collector=softnet ts=2024-07-07T12:37:02.198Z caller=node_exporter.go:118 level=info collector=stat ts=2024-07-07T12:37:02.198Z caller=node_exporter.go:118 level=info collector=tapestats ts=2024-07-07T12:37:02.198Z caller=node_exporter.go:118 level=info collector=textfile ts=2024-07-07T12:37:02.198Z caller=node_exporter.go:118 level=info collector=thermal_zone ts=2024-07-07T12:37:02.198Z caller=node_exporter.go:118 level=info collector=time ts=2024-07-07T12:37:02.198Z caller=node_exporter.go:118 level=info collector=timex ts=2024-07-07T12:37:02.198Z caller=node_exporter.go:118 level=info collector=udp_queues ts=2024-07-07T12:37:02.198Z caller=node_exporter.go:118 level=info collector=uname ts=2024-07-07T12:37:02.198Z caller=node_exporter.go:118 level=info collector=vmstat ts=2024-07-07T12:37:02.198Z caller=node_exporter.go:118 level=info collector=watchdog ts=2024-07-07T12:37:02.198Z caller=node_exporter.go:118 level=info collector=xfs ts=2024-07-07T12:37:02.198Z caller=node_exporter.go:118 level=info collector=zfs ts=2024-07-07T12:37:02.199Z caller=tls_config.go:313 level=info msg="Listening on" address=[::]:9100 ts=2024-07-07T12:37:02.199Z caller=tls_config.go:316 level=info msg="TLS is disabled." http2=false address=[::]:9100 ```

Are you running node_exporter in Docker?

No

What did you do that produced an error?

Scraped it. And the scrape was stuck. Even after a couple of minutes the scrape didn't succeed. I narrowed it down to thermal_zone collector.

Running ./node_exporter --no-collector.thermal_zone makes the scrapes work again.

How can I debug this further?

SuperQ commented 4 months ago

thermal_zone comes from prometheus/procfs. It walks files in /sys/class/thermal.

erik-fauna commented 2 weeks ago

Same issue appears when using the docker image. Trying to curl the metrics hangs unless you add the --no-collector.thermal_zone flag to command.

Using go version go1.22.5