microsoft / Windows-Containers

Welcome to our Windows Containers GitHub community! Ask questions, report bugs, and suggest features -- let's work together.
MIT License
413 stars 64 forks source link

Container Memory Calculation #180

Open zhiweiv opened 2 years ago

zhiweiv commented 2 years ago

I deployed a pod with following yaml

apiVersion: v1
kind: Pod
metadata:
  name: memtest
spec:
  containers:
  - name: test
    image:  mcr.microsoft.com/windows/servercore:ltsc2019
    command: ["PowerShell", "while($true) { sleep 999 }"]

kubectl top pod shows pod memtest used 101Mi memory, however the total memory is more than that, seems only the main process's memory is counted as container memory(powershell.exe 99,196 K), you can see other system processes(mainly svchost.exe) in container used more memory but not counted.

kubectl exec memtest -- tasklist

Image Name                     PID Session Name        Session#    Mem Usage
========================= ======== ================ =========== ============
System Idle Process              0                            0          8 K
System                           4                            0        112 K
smss.exe                      6880                            0      1,232 K
csrss.exe                    10400 Services                  15      5,096 K
wininit.exe                  11796 Services                  15      7,052 K
services.exe                  2184 Services                  15      6,544 K
lsass.exe                     7676 Services                  15     26,676 K
fontdrvhost.exe               7608 Services                  15      3,320 K
svchost.exe                  10496 Services                  15     18,564 K
svchost.exe                   7888 Services                  15     20,892 K
svchost.exe                   1320 Services                  15     69,356 K
svchost.exe                   9992 Services                  15     18,768 K
CExecSvc.exe                 10220 Services                  15      5,164 K
svchost.exe                   6424 Services                  15     24,352 K
svchost.exe                   8772 Services                  15     46,928 K
svchost.exe                  10076 Services                  15     13,928 K
svchost.exe                  11684 Services                  15     53,504 K
svchost.exe                   4208 Services                  15     15,836 K
powershell.exe               10080 Services                  15     99,196 K -- main process
msdtc.exe                     3236 Services                  15      9,948 K
tasklist.exe                 11104 Services                  15      8,184 K
WmiPrvSE.exe                 10860 Services                  15      8,428 K
zhiweiv commented 2 years ago

I get same memory usage result with crictl to query the container stats directly in host, so it should not be the kubernetes issue. I think the value was retrieved from hcsshim via cri interface.

zhiweiv commented 2 years ago

This lead a new issue, seems the node memory usage is based on pods memory usage in it. kubectl top node shows a Windows node 70% memory usage, but the actual usage in Windows task manager is 90%.

ghost commented 2 years ago

This issue has been open for 30 days with no updates. @brasmith-ms, please provide an update or close this issue.

ghost commented 2 years ago

This issue has been open for 30 days with no updates. @brasmith-ms, please provide an update or close this issue.

ghost commented 2 years ago

This issue has been open for 30 days with no updates. @brasmith-ms, please provide an update or close this issue.

ghost commented 2 years ago

This issue has been open for 30 days with no updates. @brasmith-ms, please provide an update or close this issue.

ghost commented 2 years ago

This issue has been open for 30 days with no updates. @brasmith-ms, please provide an update or close this issue.

ghost commented 2 years ago

This issue has been open for 30 days with no updates. @brasmith-ms, please provide an update or close this issue.

ghost commented 2 years ago

This issue has been open for 30 days with no updates. @brasmith-ms, please provide an update or close this issue.

brasmith-ms commented 2 years ago

Hi @zhiweiv, thanks for the information here. We're aware of this issue and are working to fix the stats output from containers soon. There are a lot of things in the pipeline for performance and monitoring of Windows containers so I can assure you this will be fixed but I can't guarantee a timeline at the moment.

ghost commented 2 years ago

This issue has been open for 30 days with no updates. @brasmith-ms, please provide an update or close this issue.

ghost commented 2 years ago

This issue has been open for 30 days with no updates. @brasmith-ms, please provide an update or close this issue.

ghost commented 1 year ago

This issue has been open for 30 days with no updates. @brasmith-ms, please provide an update or close this issue.

ghost commented 1 year ago

This issue has been open for 30 days with no updates. @brasmith-ms, please provide an update or close this issue.

ghost commented 1 year ago

This issue has been open for 30 days with no updates. @brasmith-ms, please provide an update or close this issue.

microsoft-github-policy-service[bot] commented 1 year ago

This issue has been open for 30 days with no updates. @brasmith-ms, please provide an update or close this issue.

microsoft-github-policy-service[bot] commented 1 year ago

This issue has been open for 30 days with no updates. @brasmith-ms, please provide an update or close this issue.

microsoft-github-policy-service[bot] commented 1 year ago

This issue has been open for 30 days with no updates. @brasmith-ms, please provide an update or close this issue.

microsoft-github-policy-service[bot] commented 1 year ago

This issue has been open for 30 days with no updates. @brasmith-ms, please provide an update or close this issue.

microsoft-github-policy-service[bot] commented 1 year ago

This issue has been open for 30 days with no updates. @brasmith-ms, please provide an update or close this issue.

microsoft-github-policy-service[bot] commented 1 year ago

This issue has been open for 30 days with no updates. @brasmith-ms, please provide an update or close this issue.

microsoft-github-policy-service[bot] commented 1 year ago

This issue has been open for 30 days with no updates. @brasmith-ms, please provide an update or close this issue.

microsoft-github-policy-service[bot] commented 1 year ago

This issue has been open for 30 days with no updates. @brasmith-ms, please provide an update or close this issue.

fady-azmy-msft commented 1 year ago

@Howard-Haiyang-Hao are you familiar with this issue?

microsoft-github-policy-service[bot] commented 1 year ago

This issue has been open for 30 days with no updates. @Howard-Haiyang-Hao, @brasmith-ms, please provide an update or close this issue.

Howard-Haiyang-Hao commented 8 months ago

@zhiweiv, I've reached out to the feature team for an update on this issue. I'll inform you as soon as I receive any updates.

Howard-Haiyang-Hao commented 8 months ago

@zhiweiv Please follow the progress of this issue through the PR: (https://github.com/kubernetes/kubernetes/pull/122999). You'll find updates on when it will be addressed there. Thanks marosset for providing the information.

zhiweiv commented 8 months ago

Thanks for the update, but seems https://github.com/kubernetes/kubernetes/pull/122999 is related to CPU usage of Windows Container, this issue is something with Memory usage.

jwilsonCX commented 7 months ago

Hi @zhiweiv, thanks for the information here. We're aware of this issue and are working to fix the stats output from containers soon. There are a lot of things in the pipeline for performance and monitoring of Windows containers so I can assure you this will be fixed but I can't guarantee a timeline at the moment.

Hi Brandon, your tantalizing account of exciting things in the pipeline "soon" was issued over 1.5 years ago! How long must we wait before we can obtain accurate observability metrics for our Windows container workloads?

Also, please mister/miss Microsoft bot, kindly don't close this thread.

microsoft-github-policy-service[bot] commented 6 months ago

This issue has been open for 30 days with no updates. @Howard-Haiyang-Hao, @brasmith-ms, please provide an update or close this issue.

ntrappe-msft commented 6 months ago

Hi @jwilsonCX, Brandon is no longer with our team so let's see if @fady-azmy-msft knows of these ✨exciting things✨.

connexallcloud commented 6 months ago

Thanks for the update on Brandon's whereabouts, Nicole. Perhaps that explains the relative state of suspended animation of the ticket? Fingers crossed that @fady-azmy-msft has somethin' cookin' in this regard!

ntrappe-msft commented 6 months ago

If we don't have an update to share, then we'll triage this request in our next meeting and get some new eyes on it.

microsoft-github-policy-service[bot] commented 5 months ago

This issue has been open for 30 days with no updates. @Howard-Haiyang-Hao, please provide an update or close this issue.

Howard-Haiyang-Hao commented 5 months ago

This issue is being addressed in https://github.com/kubernetes/kubernetes/pull/122999. Thank you for reporting it. Please let us know if the fix resolves your issue.

TBBle commented 4 months ago

@Howard-Haiyang-Hao As noted above, that PR doesn't seem to be related to this issue, as it's fixing a CPU usage accounting issue introduced in Kubernetes in 2023, and this ticket is about Memory usage account and was reported in 2021.

zylxjtu commented 4 months ago

On Windows there are two stats used primarliy for memory consumption because on windows memory is always backed by virtual memory. The two stats are commit memory vs working set memory. https://stackoverflow.com/questions/7954781/whats-the-difference-between-working-set-and-commit-size

Kubelet and containerd only report the working set memory

If someone really needs to differentiate between the commit bytes memory and working set memory for containers they should use something like premethues node-exporter to get more detailed stats"

TBBle commented 4 months ago

@zylxjtu I think that might be a different issue. I haven't seen anything discussed here that suggests that the original diagnosis was wrong, i.e. that HCS (presumably), when asked for the memory usage (working set) of the container, is returning only the main process's working set (and maybe smss.exe?), and is not accounting for the memory usage of the other processes in the container.

In the original but report, there's 200MB-300MB of unaccounted usage there, and https://github.com/microsoft/Windows-Containers/issues/180#issuecomment-994258849 notes this is giving incorrect statistics to Kubernetes which may lead to unintentional overallocation. (Although this maybe is an unrelated issue, node memory usage/availability should not be able to be fooled by memory used by things other than pods on the host...)

zhiweiv commented 4 months ago

Thanks @TBBle for clarification, I reported this because of https://github.com/microsoft/Windows-Containers/issues/176, the svchost.exe(there are 8~10 svchost.exe created along with each container) has memory leak, in last comment you can see a pod without any activity after 23 days, svchost takes 438MB total, it will still increase as time goes, the top nodes show the node memory is high, but top pods show pod memory is low.

microsoft-github-policy-service[bot] commented 3 months ago

This issue has been open for 30 days with no updates. @Howard-Haiyang-Hao, please provide an update or close this issue.

microsoft-github-policy-service[bot] commented 2 months ago

This issue has been open for 30 days with no updates. @Howard-Haiyang-Hao, please provide an update or close this issue.

microsoft-github-policy-service[bot] commented 1 month ago

This issue has been open for 30 days with no updates. @Howard-Haiyang-Hao, please provide an update or close this issue.

microsoft-github-policy-service[bot] commented 2 weeks ago

This issue has been open for 30 days with no updates. @Howard-Haiyang-Hao, please provide an update or close this issue.