Open antifuchs opened 4 months ago
@antifuchs, the problem may be less mysterious if you show the debug log
Running:
smartctl_exporter \
--log.level=debug \
--smartctl.path=/nix/store/whfmc5r1irm9j3n9glzxc77cl50241y2-smartmontools-7.4/bin/smartctl \
--smartctl.interval=10m \
--web.listen-address=127.0.0.1:9633 2>&1 | tee ~mess/debug-log
yields this (which doesn't look particularly enlightening tbh):
ts=2024-05-11T19:34:38.019Z caller=main.go:167 level=info msg="Starting smartctl_exporter" version="(version=, branch=, revision=unknown)"
ts=2024-05-11T19:34:38.019Z caller=main.go:168 level=info msg="Build context" build_context="(go=go1.22.2, platform=linux/amd64, user=, date=, tags=unknown)"
ts=2024-05-11T19:34:38.020Z caller=readjson.go:79 level=debug msg="Scanning for devices"
ts=2024-05-11T19:34:38.046Z caller=main.go:128 level=info msg="Found device" name=sda
ts=2024-05-11T19:34:38.046Z caller=main.go:128 level=info msg="Found device" name=sdb
ts=2024-05-11T19:34:38.046Z caller=main.go:128 level=info msg="Found device" name=sdc
ts=2024-05-11T19:34:38.046Z caller=main.go:128 level=info msg="Found device" name=sdd
ts=2024-05-11T19:34:38.046Z caller=main.go:128 level=info msg="Found device" name=sde
ts=2024-05-11T19:34:38.046Z caller=main.go:128 level=info msg="Found device" name=sdf
ts=2024-05-11T19:34:38.046Z caller=main.go:128 level=info msg="Found device" name=sdg
ts=2024-05-11T19:34:38.046Z caller=main.go:128 level=info msg="Found device" name=sdh
ts=2024-05-11T19:34:38.046Z caller=main.go:128 level=info msg="Found device" name=sdi
ts=2024-05-11T19:34:38.046Z caller=main.go:128 level=info msg="Found device" name=sdj
ts=2024-05-11T19:34:38.046Z caller=main.go:128 level=info msg="Found device" name=sdk
ts=2024-05-11T19:34:38.046Z caller=main.go:128 level=info msg="Found device" name=sdl
ts=2024-05-11T19:34:38.046Z caller=main.go:128 level=info msg="Found device" name=nvme0
ts=2024-05-11T19:34:38.046Z caller=main.go:172 level=info msg="Number of devices found" count=13
ts=2024-05-11T19:34:38.046Z caller=main.go:185 level=info msg="Start background scan process"
ts=2024-05-11T19:34:38.047Z caller=main.go:186 level=info msg="Rescanning for devices every" rescanInterval=10m0s
ts=2024-05-11T19:34:38.069Z caller=readjson.go:74 level=debug msg="Collected S.M.A.R.T. json data" device=sda duration=21.995655ms
ts=2024-05-11T19:34:38.069Z caller=smartctl.go:100 level=debug msg="Collecting metrics from" device=sda family=unknown model=unknown
ts=2024-05-11T19:34:38.094Z caller=readjson.go:74 level=debug msg="Collected S.M.A.R.T. json data" device=sdb duration=24.664627ms
ts=2024-05-11T19:34:38.094Z caller=smartctl.go:100 level=debug msg="Collecting metrics from" device=sdb family=unknown model=unknown
ts=2024-05-11T19:34:38.129Z caller=readjson.go:74 level=debug msg="Collected S.M.A.R.T. json data" device=sdc duration=34.2836ms
ts=2024-05-11T19:34:38.130Z caller=smartctl.go:100 level=debug msg="Collecting metrics from" device=sdc family=unknown model=unknown
ts=2024-05-11T19:34:38.157Z caller=readjson.go:74 level=debug msg="Collected S.M.A.R.T. json data" device=sdd duration=26.83563ms
ts=2024-05-11T19:34:38.157Z caller=smartctl.go:100 level=debug msg="Collecting metrics from" device=sdd family=unknown model=unknown
ts=2024-05-11T19:34:38.183Z caller=readjson.go:74 level=debug msg="Collected S.M.A.R.T. json data" device=sde duration=25.518334ms
ts=2024-05-11T19:34:38.184Z caller=smartctl.go:100 level=debug msg="Collecting metrics from" device=sde family=unknown model=unknown
ts=2024-05-11T19:34:38.212Z caller=readjson.go:74 level=debug msg="Collected S.M.A.R.T. json data" device=sdf duration=27.646302ms
ts=2024-05-11T19:34:38.212Z caller=smartctl.go:100 level=debug msg="Collecting metrics from" device=sdf family=unknown model=unknown
ts=2024-05-11T19:34:38.247Z caller=readjson.go:74 level=debug msg="Collected S.M.A.R.T. json data" device=sdg duration=34.147328ms
ts=2024-05-11T19:34:38.247Z caller=smartctl.go:100 level=debug msg="Collecting metrics from" device=sdg family=unknown model=unknown
ts=2024-05-11T19:34:38.275Z caller=readjson.go:74 level=debug msg="Collected S.M.A.R.T. json data" device=sdh duration=27.762252ms
ts=2024-05-11T19:34:38.275Z caller=smartctl.go:100 level=debug msg="Collecting metrics from" device=sdh family=unknown model=unknown
ts=2024-05-11T19:34:38.309Z caller=readjson.go:74 level=debug msg="Collected S.M.A.R.T. json data" device=sdi duration=33.025595ms
ts=2024-05-11T19:34:38.309Z caller=smartctl.go:100 level=debug msg="Collecting metrics from" device=sdi family=unknown model=unknown
ts=2024-05-11T19:34:38.333Z caller=readjson.go:74 level=debug msg="Collected S.M.A.R.T. json data" device=sdj duration=23.642763ms
ts=2024-05-11T19:34:38.333Z caller=smartctl.go:100 level=debug msg="Collecting metrics from" device=sdj family=unknown model=unknown
ts=2024-05-11T19:34:38.354Z caller=readjson.go:74 level=debug msg="Collected S.M.A.R.T. json data" device=sdk duration=20.821869ms
ts=2024-05-11T19:34:38.355Z caller=smartctl.go:100 level=debug msg="Collecting metrics from" device=sdk family=unknown model=unknown
ts=2024-05-11T19:34:38.388Z caller=readjson.go:74 level=debug msg="Collected S.M.A.R.T. json data" device=sdl duration=32.682864ms
ts=2024-05-11T19:34:38.388Z caller=smartctl.go:100 level=debug msg="Collecting metrics from" device=sdl family=unknown model=unknown
ts=2024-05-11T19:34:38.415Z caller=readjson.go:74 level=debug msg="Collected S.M.A.R.T. json data" device=nvme0 duration=25.873639ms
ts=2024-05-11T19:34:38.415Z caller=smartctl.go:100 level=debug msg="Collecting metrics from" device=nvme0 family=unknown model="Samsung SSD 980 PRO 2TB"
ts=2024-05-11T19:34:38.417Z caller=tls_config.go:313 level=info msg="Listening on" address=127.0.0.1:9633
ts=2024-05-11T19:34:38.417Z caller=tls_config.go:316 level=info msg="TLS is disabled." http2=false address=127.0.0.1:9633
ts=2024-05-11T19:34:41.304Z caller=smartctl.go:100 level=debug msg="Collecting metrics from" device=sda family=unknown model=unknown
ts=2024-05-11T19:34:41.304Z caller=smartctl.go:100 level=debug msg="Collecting metrics from" device=sdb family=unknown model=unknown
ts=2024-05-11T19:34:41.305Z caller=smartctl.go:100 level=debug msg="Collecting metrics from" device=sdc family=unknown model=unknown
ts=2024-05-11T19:34:41.305Z caller=smartctl.go:100 level=debug msg="Collecting metrics from" device=sdd family=unknown model=unknown
ts=2024-05-11T19:34:41.305Z caller=smartctl.go:100 level=debug msg="Collecting metrics from" device=sde family=unknown model=unknown
ts=2024-05-11T19:34:41.306Z caller=smartctl.go:100 level=debug msg="Collecting metrics from" device=sdf family=unknown model=unknown
ts=2024-05-11T19:34:41.306Z caller=smartctl.go:100 level=debug msg="Collecting metrics from" device=sdg family=unknown model=unknown
ts=2024-05-11T19:34:41.307Z caller=smartctl.go:100 level=debug msg="Collecting metrics from" device=sdh family=unknown model=unknown
ts=2024-05-11T19:34:41.307Z caller=smartctl.go:100 level=debug msg="Collecting metrics from" device=sdi family=unknown model=unknown
ts=2024-05-11T19:34:41.308Z caller=smartctl.go:100 level=debug msg="Collecting metrics from" device=sdj family=unknown model=unknown
ts=2024-05-11T19:34:41.308Z caller=smartctl.go:100 level=debug msg="Collecting metrics from" device=sdk family=unknown model=unknown
ts=2024-05-11T19:34:41.308Z caller=smartctl.go:100 level=debug msg="Collecting metrics from" device=sdl family=unknown model=unknown
ts=2024-05-11T19:34:41.308Z caller=smartctl.go:100 level=debug msg="Collecting metrics from" device=nvme0 family=unknown model="Samsung SSD 980 PRO 2TB"
Seems your system is also affected with #205, because your NVMe device metrics was reads correctly You use packages from distro? It's will be better, if distro use releases tarball, instead development repo
yeah, I have been building from source - that worked while the repo was semi-maintained (and I had pull reqs outstanding), but doesn't anymore. I will reconsider.
I've just upgraded to 2cc2249821d6417fcfff8ef8d302205d7b37b44c from 0768a400a1378872eb940b45c5e0cedf0c213402, and something is wrong in the reporting of SMART status of SATA-connected SSDs. It reports smartctl_device_smart_status=0 on these, but I believe the values should 1, according to what smartctl's JSON output reports for the drives:
Best to show an example:
I'm not sure what's going on there, but something is wrong and it's making my disk badness monitoring go off spuriously /: