sciencehistory / scihist_digicoll

Science History Institute Digital Collections
Other
13 stars 0 forks source link

Reduce over-alerting from HoneyBadger for memcached down #2772

Open jrochkind opened 3 weeks ago

jrochkind commented 3 weeks ago

We use a memcached server (via a heroku plugin) only for keeping track of request velocity for rate-limiting (with rack-attack).

Sometimes the memcached server goes down for a couple minutes or whatever. We aren't sure why, or how to fix it, and it's annoying, but not actually disastrous to not be metering rate limiting for a couple minutes.

While normally HoneyBadger is pretty good at "collapsing" multiple instances of the same error into one listed error for HB -- for these, I guess there are enough different stack traces, that HB isn't able to collapse them and reports the error to us over and over again, even though to us it's just "right memcached is still down 20 seconds later."

This annoys us and can lead to error fatigue.

We should figure out how to get HoneyBadger to properly collapse these errors, or consider having it ignore them entirely -- since we don't really care about them, but on the other hand if memcached went down and STAYED down forever we'd want to know, so maybe we do want some alerting. Have to think about it.

On customizing HoneyBadger fingerprinting for grouping/collapsing errors: https://docs.honeybadger.io/lib/ruby/getting-started/customizing-error-grouping/

I guess HB is fingerprinting them all separate since they are all for different Rails controller actions, it's like a low-level side thing that goes wrong, so they all get different fingerprints?