twitter / rezolus

Systems performance telemetry
Apache License 2.0
1.56k stars 116 forks source link

Support multiple endpoints for memcache sampler #237

Open eaddingtonwhite opened 3 years ago

eaddingtonwhite commented 3 years ago

Currently the memcache sampler in Rezolus only supports passing a single Memcache endpoint to monitor and sample https://github.com/twitter/rezolus/blob/master/src/config/samplers.rs#L93 https://github.com/twitter/rezolus/blob/fadf6f8eba44002dc51911ca738f63b432f620b0/src/samplers/memcache/mod.rs#L43-L57

It would be nice if we could pass a comma separated list of host:port memcache endpoints to Rezolus memcache sampler. This would allow you to run a single Rezolus agent to monitor multiple Memcache process or containers on a single host. The config might look something like this:

# An example config that produces percentile metrics for specific Memcached stats
# while preserving the original metric names.

[general]
listen = "0.0.0.0:4242"
fault_tolerant = false
reading_suffix = ""

[samplers]
[samplers.memcache]
enabled = true
endpoint = "localhost:11211,localhost:11212,localhost:11213,localhost:11214"

An alternative would be to run a Rezolus agent per memcache process 1:1 but this creates additional overhead for end user to configure and run on host in this way.

brayniac commented 3 years ago

Interesting idea. Definitely possible to support something like this, but it also raises the question of how we'd expose these on a per-instance basis. This is a somewhat larger question in terms of how we want to support "scoped" metrics. https://github.com/twitter/rezolus/issues/109

Similar considerations should be made for per disk, or per core telemetry.

That said, it should be possible to do something for this sampler without solving the entire problem. I'd welcome a PR to add this functionality to the memcache sampler - I don't currently have bandwidth to work on the implementation or a real use-case for this. But it does seem useful.

thinkingfish commented 3 years ago

I think multiple instances of the same sampler for different endpoints can be declared similarly as different samplers. This avoids creating a hierarchy and makes the syntax consistent.

To go with this, an alternative to declaring multiple endpoints in one sampler config is to introduce the concept of dimension to sampler abstraction. Subsequently we can make endpoints/source a dimension. Eg different device or endpoints become part of the tuple that uniquely identifies a sampler.

On reporting, I am imagining having a per-sampler policy, with choices like “sum across”, “export as dimension” (Prometheus style), “export as namespace” (Twitter style) to accommodate different collectors. If we want to be fancy, we can even add config language that allow format customization of the output metric name.