rancher-sandbox / rancher-desktop

Container Management and Kubernetes on the Desktop
https://rancherdesktop.io
Apache License 2.0
5.83k stars 272 forks source link

Memory leak/zombie processes on windows #6451

Open jpambrun opened 7 months ago

jpambrun commented 7 months ago

Actual Behavior

My 32gb computer memory gets exhausted in ~24hrs when running docker-desktop. The sum of process memory doesn't add up to the total used. Task manager will show 98% used, but the sum is below 10gb. Meanwhile the computer is unusable. Using rammap.exe, I can see 10,000s of what appears to be zombie docker.exe (and also conhost.exe) process image

Running pskill yields "Unable to kill process 53764: A process being terminated has no threads to terminate.", for comparison trying to kill non-existing pid yieds "Unable to kill process 345345: Process does not exist.". Reinstalling or rebooting didn't help.

Steps to Reproduce

In my case, just have docker-desktop running. No running container image or docker build is needed to reproduce.

Result

Computer unusable due to ram exhaustion.

Expected Behavior

No that..

Additional Information

No response

Rancher Desktop Version

1.12.3

Rancher Desktop K8s Version

none

Which container engine are you using?

moby (docker cli)

What operating system are you using?

Windows

Operating System / Build Version

windows 11

What CPU architecture are you using?

x64

Linux only: what package format did you use to install Rancher Desktop?

None

Windows User Only

sentinel, twingate

mook-as commented 7 months ago

Hi!

Are you able to see those zombies in Process Explorer? If yes, can you share their command line (if available)? Otherwise, we may need to try using procmon.exe (filter only for process name docker.exe, and show only Process & Thread Activity) to see what the command line is.

Just to double check, do you have any extensions installed? (In case those are spawning docker.exe; it's also quite possible that it's something Rancher Desktop is doing internally.)

jpambrun commented 7 months ago

Hi, thanks for getting back to me. I uninstalled and re-installed and it doesn't occur anymore. It didn't show in processexplorer. The only suspicious thing I could across task manager/process explorer/rammap was in rammap. Otherwise there was no trace of where the missing memory went (and even the rammap the long list doesn't really isn't solid evidence).

I didn't know about procmon. If this ever occurs again I will try to get some insight on why docker.exe was launched so many times. I am closing this issues in the meantime.

jpambrun-vida commented 6 months ago

@mook-as and if anyone else stumble on this issue. Using procmon I could determine that it's VScode invoking docker.exe context ls about every second. Disabling the "Dev containers" extension makes it stop.

I am still not sure if something is wrong with the docker version supplied in rancher-desktop or with the extension. Running while true; do docker.exe context ls --format json ; done didn't seem to reproduce (not before I lost patience anyway). I don't use this vscode extension so I have pretty much exhausted my motivation to chase this.

jpambrun-vida commented 6 months ago

BTW, the docker demon nor rancher-desktop has to be running for this issue to manifest itself.

mook-as commented 6 months ago

That sounds interesting; it sounds like VSCode is not cleaning up properly from invoking docker.exe context ls, then; I had expected the issue to be us holding on to (the process handle for) docker.exe on exit. Maybe VSCode is doing that instead; but in that case, there wouldn't be anything we could fix from our end.

jandubois commented 6 months ago

@mook-as Can you check if this might be something specific to our docker.exe version. I can't think of anything off-hand, but there are other open issues with docker.exe and stdout/stderr handling, so maybe that is somehow related?

joerohde commented 2 months ago

Just a note that I've been seeing this. These processes only show in rammap. They seem to be resources somehow abandoned. The processes show up nowhere. In no other utility. They may well be zombies- but not in any traditional sense (even for windows).

It does happen when vscode is running. My work is almost exclusively over ssh to Macs. No containers are ever run locally (although the engine is up and running).

Overnight it will consume 48GB of 'Page Table'. [64gb physical ram].

This started happening around the time I enabled WSL2 backend. However, I can't attribute causation - as I change a lot of things frequently.

Uninstalling docker from Windows has fixed it. Sounds heavy-handed, but again, I never actually run my containers on Windows, just the occasional image management and such.

It may well be vscode - and that's worth investigating. Somewhat difficult since the processes (and thus parental tree) are gone from traditional views.

I might re-install and start messing with FindZombieHandles to see what shows. Note: Might be easier to use ProcExp, use 'Find Handle or Dll in the Find menu and search for non-existent

lystor commented 1 month ago

@mook-as and if anyone else stumble on this issue. Using procmon I could determine that it's VScode invoking docker.exe context ls about every second. Disabling the "Dev containers" extension makes it stop.

I have the same problem on 1.14.2. Disabling the "Dev containers" extension in VSCode fixes it.

jandubois commented 1 month ago

Issue seems to happen with Docker Desktop as well: https://github.com/docker/for-win/issues/13929

So it does seem to be a problem with the dev containers extension.

lystor commented 1 month ago

Issue seems to happen with Docker Desktop as well: https://github.com/docker/for-win/issues/13929

Yes, but users reported on it that the problem was fixed in Docker for Windows 4.30: https://github.com/docker/for-win/issues/13929#issuecomment-2108495175

jandubois commented 1 month ago

It is one user, who also added:

I've moved my Docker / Dev containers to a virtual machine.

So still not clear what was fixing the issue for them.

There is also a lot of talk about AMD drivers; do people who experience this issue have AMD drivers installed, or are they on Intel machines?

jandubois commented 1 month ago

A user also reported:

Did you try downgrading AMD Adrenaline to v23? Worked for me

Another user:

Just to report that the issue has gone away since I disabled integrated graphics card 2 days ago, even with the latest AMD Adrenalin driver, v24.4.1.

Yet another user:

I found a cause. That is an extension of VS code and the ID is ms-vscode-remote.remote-containers. After I disabled the extension and rebooted, there was finally no more memory leak.

🤷