google / cadvisor

Analyzes resource usage and performance characteristics of running containers.
Other
16.9k stars 2.31k forks source link

Hardware PMU - performance counters using Linux perf and /resctrl interface #2388

Open ss7pro opened 4 years ago

ss7pro commented 4 years ago

Hey,

In Intel we're working on a solution that is providing online workload characterization using hardware PMU using Linux perf interface. In addition we're also using RDT through /resctrl interface to extract memory bandwidth and cache usage. Full list of metrics that we collect is documented here:

https://github.com/intel/workload-collocation-agent/blob/master/docs/metrics.rst

We would like to migrate all telemetry collection and query to cAdvisor. Do you think it make sense to add those kind of metrics to cAdvisor and if yes do you anticipate any major issues in that approach?

dashpole commented 4 years ago

cAdvisor exposes metrics associated with cgroups. From looking at you metric list, it looks like most metrics are associated with tasks, although looking through some of the other docs, it looks like "task" can at least sometimes mean cgroup. Are all "task" metrics associated with cgroups?

iwankgb commented 4 years ago

@dashpole some of the task metrics come from resctrl filesystem (they are related to Intel RDT - cache and memory bandwidth monitoring). Task concept is similar to pod in Kubernetes: it always is a cgroup, it might include other cgroups and it might rely on other kernel interfaces to collect metrics too.

dashpole commented 4 years ago

Given a particular cgroup, is it possible to get the associated task?

Are the kernel interfaces it relies on documented anywhere?

iwankgb commented 4 years ago

Resctrl is documented in kernel.

Regarding identifying a task by cgroup - I'm not sure if I understand your question; are you asking about implementation of a task in Workload Collocation Agent?

iwankgb commented 4 years ago

BTW, as far as I can tell RDT and resctrl are already being handled using runc's IntelRdtManager, so we will be able to utilize existing functionality, I hope.

dashpole commented 4 years ago

Oh, great. I see it is already integrated into libcontainer. In that case, it will be easy to integrate.

iwankgb commented 4 years ago

cAdvisor is using libcontainer to collect all the metrics from cgroups - am I right? If this is the case then we will proceed with some PRs in libcontainer and then we will get back to you once we are ready to integrate them to cAdvisor.

dashpole commented 4 years ago

Yes. cAdvisor uses libcontainer to collect metrics from cgroups.

iwankgb commented 4 years ago

@dashpole I think that you can see full list of TODOs above.