microsoft / hcsshim

Windows - Host Compute Service Shim
MIT License
566 stars 253 forks source link

Issue with stop/start container on WS2k19 #1822

Open dardelean opened 1 year ago

dardelean commented 1 year ago

The issue is that the containers (process or hyperv isolation) fail to start (after stop) or restart. This happens on WS2k19. The issue is easy to reproduce, a standard WS2k19 deployment with nerdctl and containerd (v1.7.0-339-g87dbdd2ca). This is the latest version of containerd as of today (07.06.2023), but the issue reproduces on older versions as well.

The specific error is errors: failed to create shim task: hcs::CreateComputeSystem 7741aa979c8a1ef17659b625d73418b28421be780e848e12d82edd5c6b76312e: The requested operation for attach namespace failed.: unknown"

This is how the Cirrus CI uses WS2k19: https://github.com/containerd/nerdctl/blob/main/.cirrus.yml#L26

It uses an image built on top of "windows-2019-core-for-containers": https://github.com/cirruslabs/vm-images/blob/master/googlecompute/windows_images.json#L8

An this is how the image is configured: https://github.com/containerd/nerdctl/blob/main/hack/configure-windows-ci.ps1

We saw that during the period the container is stopped, if we remove the endpoint, the container successfully starts, but then it won't have a network endpoint. We suspect that the issue is there. containerd and the shim sends correct information to HCS, during debug we compared the go stuctures with a WS2k22 deployent, which works. One thing we did not understand were the endpoint states, state 4 for example (after the container failed to start).

acobaugh commented 1 year ago

I'm seeing the exact same thing with:

I did not see this on dockerd and EKS 1.23.

Every once-in-a-while I will have a container start up just fine.

Other containers start fine on these hosts, it just seems to be this datadog agent image that consistently fails to start with this error.

jterry75 commented 1 year ago

AttachNamespace is a networking failure. @kevpar - Could you add the right people for that. I dont remember if networking should be here or on WinContainers