jahkeup / prometheus-moto-exporter

Motorola Modem Prometheus Exporter (mb8600)
GNU Lesser General Public License v2.1
18 stars 12 forks source link

Duplicated Statistics after days of uptime #20

Open VxJasonxV opened 2 years ago

VxJasonxV commented 2 years ago

Hey there, running this in Docker (see also #14, it saved me a ton of work and a chunk of hair) against an MB8600 on a Raspberry Pi 4B, and after some amount of uptime, the metrics being exposed start to deviate and also become invalid in various ways. I'm not sure if this is the fault of the MB8600 or not (probably is). I'm sure we all know how reliable the SOAP/whatever-the-heck HNAP is and web Interface provided data of these devices is.

As of this writing I have 3 moto_device_hardware_info lines:

# HELP moto_device_hardware_info channel locked status
# TYPE moto_device_hardware_info gauge
moto_device_hardware_info{boot_file="",customer_version="Prod_19.3_d31",hardware_version="",serial="",software_version="8600-19.3.18",spec_version="DOCSIS 1.0"} 1
moto_device_hardware_info{boot_file="",customer_version="Prod_19.3_d31",hardware_version="",serial="0020-MB8600-0005",software_version="8600-19.3.18",spec_version="DOCSIS 3.1"} 1
moto_device_hardware_info{boot_file="",customer_version="Prod_19.3_d31",hardware_version="V1.0",serial="0020-MB8600-0005",software_version="8600-19.3.18",spec_version="DOCSIS 3.1"} 1

Missing hardware version, missing serial, and a random DOCSIS 1.0.

I'm still learning Grafana and visualization, so configuring the display of "Last *" (Last non-null value) data winds up showing multiple blocks of data when one of these duplicated lines of data begins to exist.

jahkeup commented 2 years ago

Whoa, yeah that is definitely not how that data should translate into the lines we see here.

I suspect there's some time between the modem bringing its channels up and probing its own hardware where these values are what's being reported.

No matter what it is, I think we can improve this behavior by caching and reporting only when we have the complete, stable data and also hold it for the entire process' lifetime. It really doesn't seem to me like folks are changing device often while the exporter is running (and if you do this and find this comment, just restart this exporter! the worst you'll get is metrics suggesting that the uptime was your old hardware).

jahkeup commented 2 years ago

Do you see empty and combinations of labels on other metrics as well?