microsoft / hcsshim

Windows - Host Compute Service Shim
MIT License
578 stars 259 forks source link

Intermittent error when creating a new Azure Container Instance #1086

Open AltarBeastiful opened 4 years ago

AltarBeastiful commented 4 years ago

For a few weeks, we've started to see an error when re-creating an Azure Container Instance that was working before: https://hub.docker.com/layers/primekey/ejbca-ce/6.15.2.3/images/sha256-9a5119dc6b95c177490ceafc9049bbf286f8dcddaaaee709963aa23d3c33c18f?context=explore

We sometimes get an error guest RPC failure: failed to find user by uid: 10001: expected exactly 1 user matched '0': unknown and the instance fails to start and sometimes everything is fine. We suspect lines USER 10001 in the docker image to fail randomly :

...
CMD ["/bin/bash"]
USER 0
COPY dir:893e424bc63d1872ee580dfed4125a0bef1fa452b8ae89aa267d83063ce36025 in /opt/primekey
COPY dir:756f0fe274b13cf418a2e3222e3f6c2e676b174f747ac059a95711db0097f283 in /licenses
USER 10001
CMD ["/opt/primekey/wildfly-14.0.1.Final/bin/standalone.sh" "-b" "0.0.0.0"
...

Full error log at startup :

(count: 1) (last timestamp: 2020-11-03 16:04:32+00:00) pulling image "primekey/ejbca-ce:6.15.2.3"
(count: 1) (last timestamp: 2020-11-03 16:04:37+00:00) Successfully pulled image "primekey/ejbca-ce:6.15.2.3"
(count: 28) (last timestamp: 2020-11-03 16:27:52+00:00) Error: Failed to start container aci-pulsy-ccm-ejbca-snd, Error response: to create containerd task: failed to create container e9e48a06807fba124dc29633dab10f6229fdc5583a95eb2b79467fe7cdffba97: guest RPC failure: failed to find user by uid: 10001: expected exactly 1 user matched '0': unknown

Is there anything we could do on the instance to avoid the random errors seen above ?

anmaxvl commented 3 years ago

I believe this has been fixed by https://github.com/microsoft/opengcs/pull/386