New Histogram Collection Creates Errors on Each Scrape With Older Redis Versions

Calebjh commented 1 year ago

Describe the problem When running redis 6.x, the new histogram metrics cause Redis error counter to increase on every scrape, even though the error message logged is limited to one.

time="2023-04-05T18:32:08Z" level=info msg="Redis Metrics Exporter v1.49.0    build date:     sha1: b8555085b299a28fd67642302072c2cf77afbce8    Go: go1.20.2    GOOS: linux    GOARCH: amd64"
time="2023-04-05T18:32:08Z" level=info msg="Providing metrics at :9121/metrics"
time="2023-04-05T18:32:32Z" level=error msg="WARNING, LOGGED ONCE ONLY: cmd LATENCY HISTOGRAM, err: ERR Unknown subcommand or wrong number of arguments for 'HISTOGRAM'. Try LATENCY HELP."

It would be convenient to have a way to disable the new histogram via ENV or automatically if the command is failing.

What version of redis_exporter are you running? Please run redis_exporter --version if you're not sure what version you're running. [ ] 0.3x.x [x] 1.48+

Running the exporter What's the full command you're using to run the exporter? (please remove passwords and other sensitive data)

Expected behavior Testing an upgrade from an older version to the newest versions. Expected things to continue working, with new metrics available.

Screenshots Periods where error counter is increasing are running v1.49, and when it is stable are v1.47

Additional context

oliver006 commented 1 year ago

Thank for opening this issue. This indeed looks wrong. I think there are multiple ways to handle this

disable checking for the latency histogram after thr first error. tricky cause now we have to keep state in the exporter across scrapes and the way the /scrape endpoint works won't work any longer
start making the exporter version-aware of the redis instance it's scraping and only take certain actions if we know they're supported. rather clean but adds complexity and possibly a lot of "if version > x.y" code
add a flag to disable accessing the latency histogram - easiest to implement but user needs to take action to configure the exporter right

I'm leaning towards 3. - what do you think?

Calebjh commented 1 year ago

I would be inclined toward the 3rd option, as well. I'm sure there would be advantages toward the 2nd, but it sounds like a lot of unnecessary work, and the 1st sounds finicky, since you might hit some transient error and disable it accidentally. From a backwards compatibility standpoint, it might make more sense in my opinion for the default state to be disabled with an option to enable. Otherwise I might not be the only one to try upgrading to the latest version and start seeing unexpected errors.

oliver006 commented 1 year ago

Makes sense. I'm a little short on time the next few weeks but if you're interested in submitting a PR I can review and merge it.

gzivdo commented 1 year ago

2 option is right way. redis_version present in "info all" The fast fix

func (e *Exporter) extractLatencyHistogramMetrics(outChan chan<- prometheus.Metric, infoAll string, redisConn redis.Conn) {
+       if !(strings.Contains(infoAll, "redis_version:7.")) {
+               return
+       }

tdewolff commented 1 year ago

@gzivdo that looks like the best fix, is there a possibility to get this merged please?

oliver006 commented 1 year ago

@tdewolff - if you submit a PR I can have a look and review it to get it merged.

grafanalf commented 1 year ago

ping?

grafanalf commented 1 year ago

https://github.com/oliver006/redis_exporter/pull/847

azhurbilo commented 11 months ago

@oliver006 I see your comment https://github.com/oliver006/redis_exporter/pull/847#issuecomment-1793788847

The log-line should show up only once - is it that big of an issue?

for us for example it's an issue as it generate Errors (redis_errors_total metric) for each exporter every time (not only in the beginning) and it prevents us to make normal alerts

oliver006 commented 11 months ago

Hmm, interesting, I agree that's not optimal (it shouldn't generate an error the first time it happens). Let me see how ot improve it, maybe bump the redis_errors_total metric only after the first error?

azhurbilo commented 11 months ago

I think option 3 is much better than generate 1 error at start

add a flag to disable accessing the latency histogram - easiest to implement but user needs to take action to configure the exporter right

oliver006 / redis_exporter

New Histogram Collection Creates Errors on Each Scrape With Older Redis Versions #784