thanos-io / thanos

Highly available Prometheus setup with long term storage capabilities. A CNCF Incubating project.
https://thanos.io
Apache License 2.0
13.13k stars 2.1k forks source link

Parse "out of memory storing object" error and report it with a dedicated "reason" #5445

Open zenador opened 2 years ago

zenador commented 2 years ago

Is your proposal related to a problem?

When memcached memory is full and it can't reclaim it, storing an entry to the cache fails with this error:

level=debug ts=2022-06-23T10:28:27.220696365Z caller=memcached_client.go:406 name=frontend-cache msg="failed to store item to memcached" key=1@b2ae91c4319dafc4 sizeBytes=86848 server=10.70.1.208:11211 err="memcache: unexpected response line from \"set\": \"SERVER_ERROR out of memory storing object\\r\\n\""

The above is logged from here.

As it is logged with debug level, these errors are often hidden unless it triggers an alert. We would like to get better visibility when this error happens.

Describe the solution you'd like

For this particular error in https://github.com/thanos-io/thanos/blob/a0f41812f01e782dd9e5e09755f0348da6fd8e88/pkg/cacheutil/memcached_client.go#L406-L413 , report it with the reason "memory-full" so it will show up properly in metrics instead of being under reason "other".

Additional context

More details about this error here.

pracucci commented 2 years ago

There is a memcached bug

It's not really a bug, but storing an entry to the cache could just fail if memcached memory is full and it can't reclaim it.

We would like to get better visibility when that happens.

pracucci commented 2 years ago

report it with the reason "oom-store-obj"

I would suggest memory-full to better clarify what's going on.

stale[bot] commented 2 years ago

Hello 👋 Looks like there was no activity on this issue for the last two months. Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗 If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.