Open jdmarshall opened 1 year ago
swarm or Kubernetes
In fact for us it's worse than this. We run a collector on a common host image, so it's every docker container we deploy, not just related ones.
I believe the issue you are referring to is #118? Given
That said, if you could talk about what metrics you use/don't. It might be worth adding some flags here.
I think the slab metrics are a really good candidate given the high cardinality. I would like the flag naming to be in line with the node and mysql exporter (which have extensive on/off toggles for classes of data), so something like --(no-)collector.slab
.
@SuperQ shout if you disagree, otherwise I would encourage @jdmarshall to send a PR 😄
I looked at the code for the slab metadata before filing the issue. It is... not well-contained. It's going to take someone with experience in this code base to do it without breaking other things.
Looking for ways to economize on total stats production, I found that on our project we are seeing around
11000 stats/interval total for 2 memcached clusters. 9600 stats/interval with the prefix memcached_slab
I don't even know how someone is supposed to interpret that telemetry. I certainly don't think users want to pay 90% of their stats budget for slab allocator data. I'd like a way to turn that off.
In a related ticket that was closed by the author, it was suggested to drop stats at the collector, but collector configs could be shared by any number of services running on the same box, and getting communal access to that configuration is a whole other layer of logistics to manage, possibly 2. With swarm or Kubernetes managing them you are asking for one person to drop all of the stats collection for several teams.