prometheus / snmp_exporter

SNMP Exporter for Prometheus
Apache License 2.0
1.68k stars 625 forks source link

[Discussion] Transmitting wrong data vs failing/filtering within the exporter, and how to make this bearable for the user #280

Open RichiH opened 6 years ago

RichiH commented 6 years ago

We are having recurring issues with devices sending garbage data which violates SNMP specs. Also, we should offer users a way to work around issues as it's often not realistic to have vendors fix their implementations; https://github.com/prometheus/snmp_exporter/issues/186 is the most likely correct approach for this.

In https://github.com/prometheus/snmp_exporter/issues/279#issuecomment-379199243 @SuperQ raises another recurring point: Corrupt data is exposed to Prometheus, leaving the user to figure out what the specific mistake is, more often than not confusing them for some time.

@SuperQ 's suggestion is to fail at the exporter, but that means the user now does not even have the data at all to work with. The pro, and con, of Prometheus exposition format is that the live and debug data are the same.

A different would be to prefix and suffix (in case of cURL scrolling a lot, etc) a marker that this data is invalid, but still exposing the data to ease debugging. This would even allow to highlight incorrect data and inlining other hints to ease debugging as the dataset is not valid Prometheus exposition format any more, anyway. Also, it allows Prometheus to stop parsing immediately without processing any data from the scrape at all.

Yes, this might be discussion outside of the scope of snmp_exporter. And yes, I suspect that if we decide to do this, it will have implications for OpenMetrics.

brian-brazil commented 6 years ago

suggestion is to fail at the exporter

That's what we do today, client_golang has a check for this.

A different would be to prefix and suffix (in case of cURL scrolling a lot, etc) a marker that this data is invalid, but still exposing the data to ease debugging.

This is all done in client_golang. If you want to propose better error messages there is the place to request it. I don't think exposing data in a format which we know is invalid is a good idea as it would cause confusion.

RichiH commented 6 years ago

The current state causes confusion as well.

brian-brazil commented 6 years ago

We no longer send bad data to Prometheus as of 0.10.0, failing the scrape instead when the device violates the SNMP spec.

RichiH commented 3 years ago

@SuperQ just to revisit; no strong opinion as of right now.

xkilian commented 3 years ago

I would say fail with a meaningful error reason and if possible filter+raise error counter exposed to prometheus