Visually Document Container Memory Metrics and their Relationships

kubernetes / website

Kubernetes website and documentation repo:

https://kubernetes.io

Creative Commons Attribution 4.0 International

4.47k stars 14.39k forks source link

Visually Document Container Memory Metrics and their Relationships #25388

Open stevekuznetsov opened 3 years ago

stevekuznetsov commented 3 years ago

This is a Feature Request

What would you like to be added A document that explains what all the different container memory metrics mean and how they are interrelated.

Why is this needed Today, the following metrics exist for container memory:

container_memory_cache
container_memory_mapped_file
container_memory_max_usage_bytes
container_memory_rss
container_memory_swap
container_memory_usage_bytes
container_memory_working_set_bytes

I would like to see a document that explains what they are, how they are different or similar to each other, how they nest, what container="" and container="POD" mean, which metric(s) are used by the kubelet to evict, why usage_bytes and max_usage_bytes might differ, the effects of quantized sampling, etc.

Comments A visual description would be amazing here, as there are hierarchical relationships that would benefit from such a view.

stevekuznetsov commented 3 years ago

/sig instrumentation

stevekuznetsov commented 3 years ago

/cc @ehashman

sftim commented 3 years ago

This would be a great addition to the reference docs - especially with a visual.

/triage accepted /priority backlog /kind feature /language en

ehashman commented 3 years ago

/help

k8s-ci-robot commented 3 years ago

@ehashman: This request has been marked as needing help from a contributor.

Please ensure the request meets the requirements listed here.

If this request no longer meets these requirements, the label can be removed by commenting with the /remove-help command.

In response to [this](https://github.com/kubernetes/website/issues/25388): >/help Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.

sftim commented 3 years ago

@brennerm as you've been doing a great job drafting diagrams, I thought you might like to know about this feature request too

ehashman commented 3 years ago

/cc @bobbypage

fejta-bot commented 3 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale

ehashman commented 3 years ago

/remove-lifecycle stale

stevekuznetsov commented 3 years ago

xref https://github.com/google/cadvisor/issues/2138

The above issue from cAdvisor seems to document how all of this goes on ...

stevekuznetsov commented 3 years ago

OK, so from what I can tell, the following things are true:

container_memory_working_set_bytes = container_memory_usage_bytes - <inactive_memory>

We can see this calculation here, and <inactive_memory> is a kernel concept

Furthermore:

container_memory_usage_bytes == container_memory_rss + container_memory_cache + container_memory_swap + <kernel memory>

Where <kernel memory> is memory allocated within the kernel, not yet exposed from cAdvisor (and therefore not exposed to Prometheus) as of https://github.com/google/cadvisor/issues/2138

stevekuznetsov commented 3 years ago

The kernel and kubelet will use the container_memory_working_set_bytes for OOMKills.

stevekuznetsov commented 3 years ago

@ehashman @bobbypage @sftim @derekwaynecarr it doesn't look like the subject matter experts on this are too keen on documenting this, so I'll try to do it, I guess. Who will review my work, and if they understand this well could they perhaps jot down more thoughts in response to what I've written?

stevekuznetsov commented 3 years ago

Perhaps @mrunalp or @rphillips know?

ehashman commented 3 years ago

@stevekuznetsov I'll make sure we get a reviewer, I might pull in @dashpole and I'll take a look as well.

ehashman commented 3 years ago

/lifecycle frozen

derekwaynecarr commented 2 years ago

@stevekuznetsov i am happy to help review.

sftim commented 2 years ago

I wonder if we could sketch out (and for now only sketch out) what we want, saving the detailed work for a docs sprint at the next KubeCon.

stevekuznetsov commented 2 years ago

Totally! I think the questions I always ended up coming to were:

what is the relationship between all of the metrics I am able to see?
which metric, specifically, is being used by e.g. the scheduler/descheduler, the kubelet (for evictions) and the kernel (for hard-OOM)?

jai commented 1 year ago

/assign

vaibhav2107 commented 10 months ago

@jai Are you still working on this issue?

vaibhav2107 commented 4 months ago

Unassigning @jai as didn't receive any updates. Please assign if you get back to this or anyone can also assign to this issue /unassign @jai