prometheus-community / smartctl_exporter

Export smartctl statistics to prometheus
Apache License 2.0
314 stars 91 forks source link

Device scanning not disabled on machines with many disks #230

Closed tacerus closed 5 months ago

tacerus commented 5 months ago

Hi,

by my interpretation of the README and --help, specifying one or more --smartctl.device= arguments should disable the device scanning in favor of a static list.

On some of my systems this seems to not be the case, launching the exporter with

smartctl_exporter --log.level=debug --smartctl.device=sda --smartctl.device=sdb --smartctl.interval=10m

on them yields

Heavily truncated output ``` ts=2024-05-23T00:44:06.408Z caller=readjson.go:79 level=debug msg="Scanning for devices" ts=2024-05-23T00:44:06.486Z caller=main.go:128 level=info msg="Found device" name=sda ts=2024-05-23T00:44:06.487Z caller=main.go:128 level=info msg="Found device" name=sdb ts=2024-05-23T00:44:06.487Z caller=main.go:128 level=info msg="Found device" name=sdc ts=2024-05-23T00:44:06.487Z caller=main.go:128 level=info msg="Found device" name=sdd ts=2024-05-23T00:44:06.487Z caller=main.go:128 level=info msg="Found device" name=sde ts=2024-05-23T00:44:06.487Z caller=main.go:128 level=info msg="Found device" name=sdf ts=2024-05-23T00:44:06.487Z caller=main.go:128 level=info msg="Found device" name=sdg ts=2024-05-23T00:44:06.488Z caller=main.go:128 level=info msg="Found device" name=sdh ts=2024-05-23T00:44:06.488Z caller=main.go:128 level=info msg="Found device" name=sdi ts=2024-05-23T00:44:06.488Z caller=main.go:128 level=info msg="Found device" name=sdj ts=2024-05-23T00:44:06.488Z caller=main.go:128 level=info msg="Found device" name=sdk ts=2024-05-23T00:44:06.488Z caller=main.go:128 level=info msg="Found device" name=sdl ts=2024-05-23T00:44:06.488Z caller=main.go:128 level=info msg="Found device" name=sdm ts=2024-05-23T00:44:06.488Z caller=main.go:128 level=info msg="Found device" name=sdn ts=2024-05-23T00:44:06.488Z caller=main.go:128 level=info msg="Found device" name=sdo ts=2024-05-23T00:44:06.488Z caller=main.go:128 level=info msg="Found device" name=sdp ts=2024-05-23T00:44:06.488Z caller=main.go:128 level=info msg="Found device" name=sdq ts=2024-05-23T00:44:06.488Z caller=main.go:128 level=info msg="Found device" name=sdr ts=2024-05-23T00:44:06.488Z caller=main.go:128 level=info msg="Found device" name=sds ts=2024-05-23T00:44:06.488Z caller=main.go:128 level=info msg="Found device" name=sdt ts=2024-05-23T00:44:06.488Z caller=main.go:128 level=info msg="Found device" name=sdu ts=2024-05-23T00:44:06.488Z caller=main.go:128 level=info msg="Found device" name=sdv ts=2024-05-23T00:44:06.488Z caller=main.go:128 level=info msg="Found device" name=sdw ts=2024-05-23T00:44:06.488Z caller=main.go:128 level=info msg="Found device" name=sdx ts=2024-05-23T00:44:06.488Z caller=main.go:128 level=info msg="Found device" name=sdy ts=2024-05-23T00:44:06.489Z caller=main.go:128 level=info msg="Found device" name=sdz ts=2024-05-23T00:44:06.489Z caller=main.go:128 level=info msg="Found device" name=sdaa ts=2024-05-23T00:44:06.489Z caller=main.go:128 level=info msg="Found device" name=sdab ts=2024-05-23T00:44:06.489Z caller=main.go:128 level=info msg="Found device" name=sdac ts=2024-05-23T00:44:06.489Z caller=main.go:128 level=info msg="Found device" name=sdad ts=2024-05-23T00:44:06.489Z caller=main.go:128 level=info msg="Found device" name=sdae ts=2024-05-23T00:44:06.489Z caller=main.go:128 level=info msg="Found device" name=sdaf ts=2024-05-23T00:44:06.489Z caller=main.go:128 level=info msg="Found device" name=sdag ts=2024-05-23T00:44:06.489Z caller=main.go:128 level=info msg="Found device" name=sdah ts=2024-05-23T00:44:06.489Z caller=main.go:128 level=info msg="Found device" name=sdai ts=2024-05-23T00:44:06.489Z caller=main.go:128 level=info msg="Found device" name=sdaj ts=2024-05-23T00:44:06.489Z caller=main.go:128 level=info msg="Found device" name=sdak ts=2024-05-23T00:44:06.489Z caller=main.go:128 level=info msg="Found device" name=sdal ts=2024-05-23T00:44:06.489Z caller=main.go:128 level=info msg="Found device" name=sdam ts=2024-05-23T00:44:06.489Z caller=main.go:128 level=info msg="Found device" name=sdan ts=2024-05-23T00:44:06.489Z caller=main.go:128 level=info msg="Found device" name=sdao ts=2024-05-23T00:44:06.489Z caller=main.go:128 level=info msg="Found device" name=sdap ts=2024-05-23T00:44:06.489Z caller=main.go:128 level=info msg="Found device" name=sdaq ts=2024-05-23T00:44:06.489Z caller=main.go:128 level=info msg="Found device" name=sdar ts=2024-05-23T00:44:06.489Z caller=main.go:128 level=info msg="Found device" name=sdas ts=2024-05-23T00:44:06.489Z caller=main.go:128 level=info msg="Found device" name=sdat ts=2024-05-23T00:44:06.489Z caller=main.go:128 level=info msg="Found device" name=sdau ts=2024-05-23T00:44:06.489Z caller=main.go:128 level=info msg="Found device" name=sdav < ... many many more ... > ts=2024-05-23T00:44:06.501Z caller=main.go:128 level=info msg="Found device" name=sdix ts=2024-05-23T00:44:06.501Z caller=main.go:172 level=info msg="Number of devices found" count=258 ts=2024-05-23T00:44:06.502Z caller=main.go:174 level=info msg="Devices specified" devices="sda, sdb" ts=2024-05-23T00:44:06.502Z caller=main.go:144 level=debug msg=filterDevices device=sda filter=sda ts=2024-05-23T00:44:06.502Z caller=main.go:144 level=debug msg=filterDevices device=sdb filter=sda ts=2024-05-23T00:44:06.502Z caller=main.go:144 level=debug msg=filterDevices device=sdb filter=sdb ts=2024-05-23T00:44:06.502Z caller=main.go:144 level=debug msg=filterDevices device=sdc filter=sda ts=2024-05-23T00:44:06.502Z caller=main.go:144 level=debug msg=filterDevices device=sdc filter=sdb ts=2024-05-23T00:44:06.502Z caller=main.go:144 level=debug msg=filterDevices device=sdd filter=sda ts=2024-05-23T00:44:06.502Z caller=main.go:144 level=debug msg=filterDevices device=sdd filter=sdb ts=2024-05-23T00:44:06.502Z caller=main.go:144 level=debug msg=filterDevices device=sde filter=sda ts=2024-05-23T00:44:06.502Z caller=main.go:144 level=debug msg=filterDevices device=sde filter=sdb ts=2024-05-23T00:44:06.502Z caller=main.go:144 level=debug msg=filterDevices device=sdf filter=sda ts=2024-05-23T00:44:06.502Z caller=main.go:144 level=debug msg=filterDevices device=sdf filter=sdb ts=2024-05-23T00:44:06.502Z caller=main.go:144 level=debug msg=filterDevices device=sdg filter=sda ts=2024-05-23T00:44:06.502Z caller=main.go:144 level=debug msg=filterDevices device=sdg filter=sdb ts=2024-05-23T00:44:06.502Z caller=main.go:144 level=debug msg=filterDevices device=sdh filter=sda ts=2024-05-23T00:44:06.502Z caller=main.go:144 level=debug msg=filterDevices device=sdh filter=sdb ts=2024-05-23T00:44:06.502Z caller=main.go:144 level=debug msg=filterDevices device=sdi filter=sda ts=2024-05-23T00:44:06.502Z caller=main.go:144 level=debug msg=filterDevices device=sdi filter=sdb ts=2024-05-23T00:44:06.502Z caller=main.go:144 level=debug msg=filterDevices device=sdj filter=sda ts=2024-05-23T00:44:06.502Z caller=main.go:144 level=debug msg=filterDevices device=sdj filter=sdb ts=2024-05-23T00:44:06.502Z caller=main.go:144 level=debug msg=filterDevices device=sdk filter=sda ts=2024-05-23T00:44:06.502Z caller=main.go:144 level=debug msg=filterDevices device=sdk filter=sdb ts=2024-05-23T00:44:06.502Z caller=main.go:144 level=debug msg=filterDevices device=sdl filter=sda ts=2024-05-23T00:44:06.502Z caller=main.go:144 level=debug msg=filterDevices device=sdl filter=sdb ts=2024-05-23T00:44:06.502Z caller=main.go:144 level=debug msg=filterDevices device=sdm filter=sda ts=2024-05-23T00:44:06.502Z caller=main.go:144 level=debug msg=filterDevices device=sdm filter=sdb ts=2024-05-23T00:44:06.502Z caller=main.go:144 level=debug msg=filterDevices device=sdn filter=sda ts=2024-05-23T00:44:06.502Z caller=main.go:144 level=debug msg=filterDevices device=sdn filter=sdb ts=2024-05-23T00:44:06.502Z caller=main.go:144 level=debug msg=filterDevices device=sdo filter=sda ts=2024-05-23T00:44:06.502Z caller=main.go:144 level=debug msg=filterDevices device=sdo filter=sdb ts=2024-05-23T00:44:06.502Z caller=main.go:144 level=debug msg=filterDevices device=sdp filter=sda ts=2024-05-23T00:44:06.502Z caller=main.go:144 level=debug msg=filterDevices device=sdp filter=sdb ts=2024-05-23T00:44:06.502Z caller=main.go:144 level=debug msg=filterDevices device=sdq filter=sda ts=2024-05-23T00:44:06.502Z caller=main.go:144 level=debug msg=filterDevices device=sdq filter=sdb ts=2024-05-23T00:44:06.502Z caller=main.go:144 level=debug msg=filterDevices device=sdr filter=sda ts=2024-05-23T00:44:06.502Z caller=main.go:144 level=debug msg=filterDevices device=sdr filter=sdb ts=2024-05-23T00:44:06.502Z caller=main.go:144 level=debug msg=filterDevices device=sds filter=sda ts=2024-05-23T00:44:06.502Z caller=main.go:144 level=debug msg=filterDevices device=sds filter=sdb ts=2024-05-23T00:44:06.502Z caller=main.go:144 level=debug msg=filterDevices device=sdt filter=sda ts=2024-05-23T00:44:06.502Z caller=main.go:144 level=debug msg=filterDevices device=sdt filter=sdb ts=2024-05-23T00:44:06.502Z caller=main.go:144 level=debug msg=filterDevices device=sdu filter=sda ts=2024-05-23T00:44:06.502Z caller=main.go:144 level=debug msg=filterDevices device=sdu filter=sdb ts=2024-05-23T00:44:06.502Z caller=main.go:144 level=debug msg=filterDevices device=sdv filter=sda ts=2024-05-23T00:44:06.502Z caller=main.go:144 level=debug msg=filterDevices device=sdv filter=sdb ts=2024-05-23T00:44:06.502Z caller=main.go:144 level=debug msg=filterDevices device=sdw filter=sda ts=2024-05-23T00:44:06.502Z caller=main.go:144 level=debug msg=filterDevices device=sdw filter=sdb ts=2024-05-23T00:44:06.502Z caller=main.go:144 level=debug msg=filterDevices device=sdx filter=sda ts=2024-05-23T00:44:06.502Z caller=main.go:144 level=debug msg=filterDevices device=sdx filter=sdb ts=2024-05-23T00:44:06.502Z caller=main.go:144 level=debug msg=filterDevices device=sdy filter=sda ts=2024-05-23T00:44:06.502Z caller=main.go:144 level=debug msg=filterDevices device=sdy filter=sdb ts=2024-05-23T00:44:06.502Z caller=main.go:144 level=debug msg=filterDevices device=sdz filter=sda ts=2024-05-23T00:44:06.502Z caller=main.go:144 level=debug msg=filterDevices device=sdz filter=sdb ts=2024-05-23T00:44:06.502Z caller=main.go:144 level=debug msg=filterDevices device=sdaa filter=sda ts=2024-05-23T00:44:06.502Z caller=main.go:144 level=debug msg=filterDevices device=sdab filter=sda ts=2024-05-23T00:44:06.502Z caller=main.go:144 level=debug msg=filterDevices device=sdac filter=sda ts=2024-05-23T00:44:06.502Z caller=main.go:144 level=debug msg=filterDevices device=sdad filter=sda ts=2024-05-23T00:44:06.502Z caller=main.go:144 level=debug msg=filterDevices device=sdae filter=sda ts=2024-05-23T00:44:06.502Z caller=main.go:144 level=debug msg=filterDevices device=sdaf filter=sda ts=2024-05-23T00:44:06.502Z caller=main.go:144 level=debug msg=filterDevices device=sdag filter=sda ts=2024-05-23T00:44:06.502Z caller=main.go:144 level=debug msg=filterDevices device=sdah filter=sda ts=2024-05-23T00:44:06.502Z caller=main.go:144 level=debug msg=filterDevices device=sdai filter=sda ts=2024-05-23T00:44:06.502Z caller=main.go:144 level=debug msg=filterDevices device=sdaj filter=sda ts=2024-05-23T00:44:06.502Z caller=main.go:144 level=debug msg=filterDevices device=sdak filter=sda ts=2024-05-23T00:44:06.502Z caller=main.go:144 level=debug msg=filterDevices device=sdal filter=sda ts=2024-05-23T00:44:06.502Z caller=main.go:144 level=debug msg=filterDevices device=sdam filter=sda ts=2024-05-23T00:44:06.502Z caller=main.go:144 level=debug msg=filterDevices device=sdan filter=sda ts=2024-05-23T00:44:06.502Z caller=main.go:144 level=debug msg=filterDevices device=sdao filter=sda ts=2024-05-23T00:44:06.502Z caller=main.go:144 level=debug msg=filterDevices device=sdap filter=sda ts=2024-05-23T00:44:06.502Z caller=main.go:144 level=debug msg=filterDevices device=sdaq filter=sda ts=2024-05-23T00:44:06.502Z caller=main.go:144 level=debug msg=filterDevices device=sdar filter=sda ts=2024-05-23T00:44:06.502Z caller=main.go:144 level=debug msg=filterDevices device=sdas filter=sda ts=2024-05-23T00:44:06.502Z caller=main.go:144 level=debug msg=filterDevices device=sdat filter=sda ts=2024-05-23T00:44:06.502Z caller=main.go:144 level=debug msg=filterDevices device=sdau filter=sda ts=2024-05-23T00:44:06.502Z caller=main.go:144 level=debug msg=filterDevices device=sdav filter=sda < ... many many more ... > ts=2024-05-23T00:44:06.508Z caller=main.go:144 level=debug msg=filterDevices device=sdix filter=sdb ts=2024-05-23T00:44:06.508Z caller=main.go:176 level=info msg="Devices filtered" count=54 ts=2024-05-23T00:44:06.508Z caller=main.go:185 level=info msg="Start background scan process" ts=2024-05-23T00:44:06.508Z caller=main.go:186 level=info msg="Rescanning for devices every" rescanInterval=10m0s ts=2024-05-23T00:44:06.532Z caller=readjson.go:74 level=debug msg="Collected S.M.A.R.T. json data" device=sda duration=24.045578ms ts=2024-05-23T00:44:06.533Z caller=smartctl.go:100 level=debug msg="Collecting metrics from" device=sda family=unknown model=unknown ts=2024-05-23T00:44:06.554Z caller=readjson.go:74 level=debug msg="Collected S.M.A.R.T. json data" device=sdb duration=20.961056ms ts=2024-05-23T00:44:06.554Z caller=smartctl.go:100 level=debug msg="Collecting metrics from" device=sdb family=unknown model=unknown ts=2024-05-23T00:44:06.573Z caller=readjson.go:74 level=debug msg="Collected S.M.A.R.T. json data" device=sdaa duration=18.480028ms ts=2024-05-23T00:44:06.573Z caller=smartctl.go:100 level=debug msg="Collecting metrics from" device=sdaa family=unknown model="NETAPP LUN C-Mode" ts=2024-05-23T00:44:06.596Z caller=readjson.go:74 level=debug msg="Collected S.M.A.R.T. json data" device=sdab duration=22.504507ms ts=2024-05-23T00:44:06.596Z caller=smartctl.go:100 level=debug msg="Collecting metrics from" device=sdab family=unknown model="NETAPP LUN C-Mode" ts=2024-05-23T00:44:06.619Z caller=readjson.go:74 level=debug msg="Collected S.M.A.R.T. json data" device=sdac duration=22.735266ms ts=2024-05-23T00:44:06.619Z caller=smartctl.go:100 level=debug msg="Collecting metrics from" device=sdac family=unknown model="NETAPP LUN C-Mode" ts=2024-05-23T00:44:06.640Z caller=readjson.go:74 level=debug msg="Collected S.M.A.R.T. json data" device=sdad duration=20.827641ms ts=2024-05-23T00:44:06.641Z caller=smartctl.go:100 level=debug msg="Collecting metrics from" device=sdad family=unknown model="NETAPP LUN C-Mode" ts=2024-05-23T00:44:06.661Z caller=readjson.go:74 level=debug msg="Collected S.M.A.R.T. json data" device=sdae duration=19.888174ms ts=2024-05-23T00:44:06.661Z caller=smartctl.go:100 level=debug msg="Collecting metrics from" device=sdae family=unknown model="NETAPP LUN C-Mode" ts=2024-05-23T00:44:06.677Z caller=readjson.go:74 level=debug msg="Collected S.M.A.R.T. json data" device=sdaf duration=15.363015ms ts=2024-05-23T00:44:06.677Z caller=smartctl.go:100 level=debug msg="Collecting metrics from" device=sdaf family=unknown model="NETAPP LUN C-Mode" ts=2024-05-23T00:44:06.697Z caller=readjson.go:74 level=debug msg="Collected S.M.A.R.T. json data" device=sdag duration=19.691726ms ts=2024-05-23T00:44:06.697Z caller=smartctl.go:100 level=debug msg="Collecting metrics from" device=sdag family=unknown model="NETAPP LUN C-Mode" ts=2024-05-23T00:44:06.723Z caller=readjson.go:74 level=debug msg="Collected S.M.A.R.T. json data" device=sdah duration=23.441834ms ts=2024-05-23T00:44:06.723Z caller=smartctl.go:100 level=debug msg="Collecting metrics from" device=sdah family=unknown model="NETAPP LUN C-Mode" ts=2024-05-23T00:44:06.747Z caller=readjson.go:74 level=debug msg="Collected S.M.A.R.T. json data" device=sdai duration=23.386201ms ts=2024-05-23T00:44:06.747Z caller=smartctl.go:100 level=debug msg="Collecting metrics from" device=sdai family=unknown model="NETAPP LUN C-Mode" ts=2024-05-23T00:44:06.762Z caller=readjson.go:74 level=debug msg="Collected S.M.A.R.T. json data" device=sdaj duration=15.20372ms ts=2024-05-23T00:44:06.763Z caller=smartctl.go:100 level=debug msg="Collecting metrics from" device=sdaj family=unknown model="NETAPP LUN C-Mode" ts=2024-05-23T00:44:06.778Z caller=readjson.go:74 level=debug msg="Collected S.M.A.R.T. json data" device=sdak duration=15.214289ms ts=2024-05-23T00:44:06.778Z caller=smartctl.go:100 level=debug msg="Collecting metrics from" device=sdak family=unknown model="NETAPP LUN C-Mode" ts=2024-05-23T00:44:06.794Z caller=readjson.go:74 level=debug msg="Collected S.M.A.R.T. json data" device=sdal duration=15.278972ms ts=2024-05-23T00:44:06.794Z caller=smartctl.go:100 level=debug msg="Collecting metrics from" device=sdal family=unknown model="NETAPP LUN C-Mode" ts=2024-05-23T00:44:06.816Z caller=readjson.go:74 level=debug msg="Collected S.M.A.R.T. json data" device=sdam duration=21.334482ms ts=2024-05-23T00:44:06.816Z caller=smartctl.go:100 level=debug msg="Collecting metrics from" device=sdam family=unknown model="NETAPP LUN C-Mode" ts=2024-05-23T00:44:06.837Z caller=readjson.go:74 level=debug msg="Collected S.M.A.R.T. json data" device=sdan duration=20.312241ms ts=2024-05-23T00:44:06.837Z caller=smartctl.go:100 level=debug msg="Collecting metrics from" device=sdan family=unknown model="NETAPP LUN C-Mode" ts=2024-05-23T00:44:06.857Z caller=readjson.go:74 level=debug msg="Collected S.M.A.R.T. json data" device=sdao duration=20.127514ms ts=2024-05-23T00:44:06.857Z caller=smartctl.go:100 level=debug msg="Collecting metrics from" device=sdao family=unknown model="NETAPP LUN C-Mode" ts=2024-05-23T00:44:06.883Z caller=readjson.go:74 level=debug msg="Collected S.M.A.R.T. json data" device=sdap duration=25.081858ms ts=2024-05-23T00:44:06.883Z caller=smartctl.go:100 level=debug msg="Collecting metrics from" device=sdap family=unknown model="NETAPP LUN C-Mode" ts=2024-05-23T00:44:06.907Z caller=readjson.go:74 level=debug msg="Collected S.M.A.R.T. json data" device=sdaq duration=23.280918ms ts=2024-05-23T00:44:06.907Z caller=smartctl.go:100 level=debug msg="Collecting metrics from" device=sdaq family=unknown model="NETAPP LUN C-Mode" ts=2024-05-23T00:44:06.930Z caller=readjson.go:74 level=debug msg="Collected S.M.A.R.T. json data" device=sdar duration=23.146872ms ts=2024-05-23T00:44:06.930Z caller=smartctl.go:100 level=debug msg="Collecting metrics from" device=sdar family=unknown model="NETAPP LUN C-Mode" ts=2024-05-23T00:44:06.946Z caller=readjson.go:74 level=debug msg="Collected S.M.A.R.T. json data" device=sdas duration=15.344065ms ts=2024-05-23T00:44:06.946Z caller=smartctl.go:100 level=debug msg="Collecting metrics from" device=sdas family=unknown model="NETAPP LUN C-Mode" ts=2024-05-23T00:44:06.961Z caller=readjson.go:74 level=debug msg="Collected S.M.A.R.T. json data" device=sdat duration=14.958689ms ts=2024-05-23T00:44:06.962Z caller=smartctl.go:100 level=debug msg="Collecting metrics from" device=sdat family=unknown model="NETAPP LUN C-Mode" ts=2024-05-23T00:44:06.977Z caller=readjson.go:74 level=debug msg="Collected S.M.A.R.T. json data" device=sdau duration=15.624806ms ts=2024-05-23T00:44:06.978Z caller=smartctl.go:100 level=debug msg="Collecting metrics from" device=sdau family=unknown model="NETAPP LUN C-Mode" ts=2024-05-23T00:44:06.999Z caller=readjson.go:74 level=debug msg="Collected S.M.A.R.T. json data" device=sdav duration=21.541429ms ts=2024-05-23T00:44:07.000Z caller=smartctl.go:100 level=debug msg="Collecting metrics from" device=sdav family=unknown model="NETAPP LUN C-Mode" < ... many many more ... > ts=2024-05-23T00:44:07.631Z caller=readjson.go:74 level=debug msg="Collected S.M.A.R.T. json data" device=sdbz duration=22.504767ms ts=2024-05-23T00:44:07.631Z caller=smartctl.go:100 level=debug msg="Collecting metrics from" device=sdbz family=unknown model="NETAPP LUN C-Mode" ```

whereas I expected to only have /dev/sda and /dev/sdb show up.

Similarly, the exported metrics contain all the other devices.

It does work as expected on other systems (ones with < 10 disks) with the same command line and exporter version (0.12.0), the issue only exists on the ones with large amounts of disks.

Smartd/smartctl natively handle it correctly and exclusively handle the two devices with them listed in /etc/smartd.conf.

Any ideas what's different on the problematic systems (other than the large amount of disk devices)? Most of the additional disks are from SAN storage, but I would assume the smartctl.device option to work equally for anything /dev/sdX?

tacerus commented 5 months ago

This is my bad - I accidentally built from master. The issue is with the patch from https://github.com/prometheus-community/smartctl_exporter/pull/205 which disables the smartctl.devices logic (interestingly keeping the argument, albeit it not doing anything) but is not present in the tagged release.

Sorry for the noise!