microsoft / Windows-Containers

Welcome to our Windows Containers GitHub community! Ask questions, report bugs, and suggest features -- let's work together.
MIT License
422 stars 64 forks source link

Can't start any two docker-compose environments after update to Windows 20H2 #140

Open stephen-turner opened 3 years ago

stephen-turner commented 3 years ago

This was originally filed by user @marcusschroeder as docker/for-win#9999, but as it can be reliably reproduced on Windows 2010 (maybe 2004?) but not 1903, and as it only happens with Windows containers, we believe that it is a bug in the windows container code.

Original report:

Expected behavior On my computer I use a couple of docker-compose environments with windows containers in parallel. Before the windows update to 2004 or 20H2 it was no problem to start several environments with docker-compose up either manually or programmatically. It didn't matter if it was the same env with a different name or a completely different one.

Actual behavior Now, when starting any two docker compose environments, the second one gets stuck in start up until the first is stopped.

Information Is it reproducible? Yes, even on another computer. Is the problem new? Yes, it appeared after windows update 1903 to 20H2 Did the problem appear with an update? Yes Windows Version: Windows 10 Pro 20H2 Docker Desktop Version: 2.5.0.1 / 2.4.0.0 / 2.3.0.4 / 2.5.1.0 (experimental) / 3.0.0.0

The problem appeared after a Windows update from 1903 to 2004 or 20H2 respectively.

I have tried:

different Docker Desktop versions (see above) to no avail. WSL2 and LCOW

Steps to reproduce the behavior Use the following docker-compose.yml:


version: "2.4"

services: service_a: image: mcr.microsoft.com/windows/servercore/iis ports:

My colleague @StefanScherer has reproduced it without docker compose, but with a second nat network as follows:

$ docker network create -d nat first
10851710aef0c9645393c78f5480cc9d8c2309b079e4d8d82f70c9c6f1ee064f
$ docker run -d --network first -p 8004:80 mcr.microsoft.com/windows/servercore/iis
e89d12002a50b22b3628ddaca1ac06e4b34728ea803c7d5250d84721ddf993bf
$ docker run -d --network first -p 8005:80 mcr.microsoft.com/windows/servercore/iis
93adddb1f6213d21d0b8138b3fd47784938c238fa75256c31f7291c8713c57aa

$ docker network create -d nat second
e30d9c556d386408137db45222dc47d989e4c0d49a7f3a051f56ee93fa18c912
$ docker run -d --network second -p 8006:80 mcr.microsoft.com/windows/servercore/iis
f05233268a2e826f79f323659bf1e3a15333cd5d440117e053ccf04ceaf1a2c8
$ docker run -d --network second -p 8007:80 mcr.microsoft.com/windows/servercore/iis
3ea1546125d05b655e89913b5dd6677e3aa58d2e2d6b4f0fe93b1ff091ebb08e

The Docker Cli of the last container does not return to the shell prompt, and in Docker Dashboard the fourth container is in CREATED state. When I kill one of the first containers (e89d12) then the Docker Cli shows this error message

docker: Error response from daemon: failed to create endpoint serene_bhabha on network second: failed during hnsCallRawResponse: hnsCall failed in Win32: The specified port already exists. (0x803b0013).

The Port 8007 was not used before.

sikhness commented 3 months ago

I would love to see this issue resolved as well. I've spent countless hours trying to debug the application logic and the docker setup on my Windows Server 2022 machine only to find that the problem is in fact the above.

For those looking for a temporary workaround (and I appreciate that this won't apply to more complex setups!), simplifying the networking stack in the compose file resolved all my issues with the containers sporadically hanging during startup, having to restart the docker service and in some cases the entire system etc. Specifically, I removed the custom networks and customised the default one instead to prevent having multiple networks (default network was being created automatically). I'm only mentioning this here to aid those who also struggled with the same problem in a hope it gets more visibility online.

Hey @marceliwac, Would you be able to provide an example of the workaround that you created? What I'm currently doing is that in my compose file, I'm using the nat network as an external network so that it attaches it to the default nat network created by Windows Containers. The problem with that though is that when you restart your machine, internally it seems Windows decides to recreate the default nat network each time so it's internal ID changes. Because of that, any compose files (and thus their already deployed containers) pointing to the older ID prior to the restart, fail to start. So I currently had to create and run a script on each startup that basically brings down all of my compose configs, then recreates them (to get the new network IDs on each boot).

marceliwac commented 3 months ago

@sikhness I'm afraid the workaround I use wouldn't be of help in this case. All I do is change the network block of docker-compose.yml to customise the default network, rather than create a new one.

If I recall correctly, when looking for the solution to this issue I stumbled upon a few threads which mentioned the same (or adjacent) issues you are facing. The solution involved setting a static MAC address for the HyperV NAT Virtual Adapter. I'm sorry I cannot give you anything more concrete, but here are few links that might be worth looking at. They are not exactly answers to your question but might give you some ideas:

https://superuser.com/questions/1701567/how-to-add-a-static-ip-adress-to-a-virtual-machine-in-hyper-v-to-stop-changing-t https://superuser.com/questions/1815670/change-mac-adress-virtual-switch-windows-server-with-hyper-v

microsoft-github-policy-service[bot] commented 2 months ago

This issue has been open for 30 days with no updates. @grcusanz, @adrianm-msft, please provide an update or close this issue.

Jens-G commented 2 months ago

please provide an update

That woulkd be indeed awesome. It seems as if this breaks a lot of stuff for enough people.

adrianm-msft commented 1 month ago

Hi, we'll be able to provide updates in a few days.

sikhness commented 1 month ago

Hi @adrianm-msft, we're eagerly awaiting good news! Also hoping whatever changes are being made to fix this issue would be present in Windows Server 2022 and the upcoming Windows Server 2025.

adrianm-msft commented 1 month ago

A fix for the issue is planned for the November release of Windows Server 2025. However, the fix for Windows Server 2022 will take a bit longer.

microsoft-github-policy-service[bot] commented 2 days ago

This issue has been open for 30 days with no updates. @grcusanz, @adrianm-msft, please provide an update or close this issue.