twitter / rezolus

Systems performance telemetry
Apache License 2.0
1.56k stars 116 forks source link

Cgroup support? #19

Open sargun opened 5 years ago

sargun commented 5 years ago

Is there any intent to support gathering metrics per cgroup?

brayniac commented 5 years ago

Hi @sargun - what kind of metrics would you like to see on a per-cgroup basis?

sargun commented 5 years ago

Some of the perf metrics would be very valuable to see per cgroup. For example, it'd be valuable to say "all cgroups under dir X" should have perf events cycles, and instructions monitored.

brayniac commented 5 years ago

Per-cgroup performance counters are quite expensive in-practice. While possible to collect at this level, some careful thought would need to go into strategies to help mitigate the performance overhead. While I do feel this would fit into the project, I don't have any plans to work on this soon.

sargun commented 5 years ago

Perhaps, do you have an opinion on how they should be exported, or how API / config should look?

brayniac commented 5 years ago

I think for regular exposition, it'd be best to gather the metrics under paths like: cgroup/[name]/[sampler]/[metric] - for instance, if we had a cgroup named 'foo' the instructions performance counter would be cgroup/foo/perf/cpu/instructions.

Ideally, for prometheus exposition, we would probably use cgroup/perf/cpu/instructions with a label containing the cgroup name. Today, this isn't directly supported in the metrics library. I don't believe this is necessarily a blocker for this feature though.

Configuration gets a little tricky, we need an optional list of cgroups to instrument, and we need to indicate which samplers should collect per-cgroup telemetry. For instance, both perf and cpu would be things we could sample per-cgroup. Users may wish to collect per-cgroup for one, both, or none. Additionally, users may wish to collect for named cgroups or all cgroups. The real trick is going to be making this feel clean and natural. The per-cgroup collection should be opt-in.

More importantly, we need to consider the performance impact. Even having a single hardware event, such as cpu instructions, instrumented with perf on a per-cgroup basis can have significant performance penalty for the running services. This becomes apparent when multiple cgroups are used for isolation. Ideally, Rezolus should not cause measurable impact to running services. We want to be sure we're keeping the overhead low both within the project and in terms of impact across the system. A strategy needs to be developed to help mitigate the performance penalty. Based on prior experiments with this, I don't believe this work is trivial. This might not make a good first issue.

sargun commented 5 years ago

How do you feel about supporting this for non-hardware events? For example, context-switches, and syscalls?

brayniac commented 5 years ago

Same. There's measurable performance impact to sensitive workloads when gathering even SW events per-cgroup when there are several cgroups on the system. I'll give some thought to this and see if there's some way to enable this work to happen or if it'll be possible for me to take this on myself.