90% of stats are memcached_slab_*

jdmarshall commented 1 year ago

Looking for ways to economize on total stats production, I found that on our project we are seeing around

11000 stats/interval total for 2 memcached clusters. 9600 stats/interval with the prefix memcached_slab

I don't even know how someone is supposed to interpret that telemetry. I certainly don't think users want to pay 90% of their stats budget for slab allocator data. I'd like a way to turn that off.

In a related ticket that was closed by the author, it was suggested to drop stats at the collector, but collector configs could be shared by any number of services running on the same box, and getting communal access to that configuration is a whole other layer of logistics to manage, possibly 2. With swarm or Kubernetes managing them you are asking for one person to drop all of the stats collection for several teams.

jdmarshall commented 1 year ago

swarm or Kubernetes

In fact for us it's worse than this. We run a collector on a common host image, so it's every docker container we deploy, not just related ones.

matthiasr commented 1 year ago

I believe the issue you are referring to is #118? Given

That said, if you could talk about what metrics you use/don't. It might be worth adding some flags here.

I think the slab metrics are a really good candidate given the high cardinality. I would like the flag naming to be in line with the node and mysql exporter (which have extensive on/off toggles for classes of data), so something like --(no-)collector.slab.

matthiasr commented 1 year ago

@SuperQ shout if you disagree, otherwise I would encourage @jdmarshall to send a PR 😄

jdmarshall commented 1 year ago

I looked at the code for the slab metadata before filing the issue. It is... not well-contained. It's going to take someone with experience in this code base to do it without breaking other things.

prometheus / memcached_exporter

90% of stats are memcached_slab_* #178