Open SamSaffron opened 5 years ago
I did realize something... should we not be de-duplicating this string?
@SamSaffron thanks a bunch for this. It is quite helpful. I agree that we should reduce the set of metrics provided by the kubelet. You can see an overview of the roadmap here: https://github.com/kubernetes/kubernetes/issues/68522.
Does it make sense to ship a "minimal" build of cadvisor with "docker only" so people only using it to monitor containers don't need to load up mesos/rkt/systmed/crio/aws and so on?
We should make use of https://github.com/google/cadvisor/pull/1926 in kubernetes to ignore mesos/systemd/aws, etc containers. We should also introduce an option to only collect raw cgroups (no docker, CRI-O, etc) for runtimes that provide container metrics via CRI.
Do we want a flag for -disable_metrics diskio we have one for disk now?
I would be happy to review a PR that adds diskIO as a metric that can be ignored.
The rough plan is for the kubelet to disable all metrics other than CPU/Memory/Disk starting in 1.15 to allow for a deprecation window.
If you're just using cadvisor for prometheus exporting, consider a storage duration of \~2x the scrapt duration.
Helped me save a few megs
nvm, restart placebo
We have noticed that resource usage (both CPU and memory) is less than ideal on cadvisor.
In particular on a machine with 40 or so containers we see cadvisor ramp up 100MB RSS fairly quickly. CPU is also highish.
From memory dumps I isolated the about 50% of memory usage is due to container disk IO stats:
By commenting this out I can get memory to around 50% and reduce CPU by more than 50%
After this is commented out I am still left with:
Which is not too unreasonable however RSS for the same process dumped here is 50MB, so my guess here is that the majority of the memory here is just libraries loaded vs actual retained data.
This makes me wonder a few things
Do we want a flag for
-disable_metrics diskio
we have one for disk now?Do we want a reduced set of metrics for diskio that is not as memory intensive as the current one?
diskio-total
maybe something like that.Does it make sense to ship a "minimal" build of cadvisor with "docker only" so people only using it to monitor containers don't need to load up mesos/rkt/systmed/crio/aws and so on?