google / gvisor-containerd-shim

containerd shim for gVisor
https://gvisor.dev
Apache License 2.0
79 stars 28 forks source link

Race condition reading exec internal pid #4

Closed ianlewis closed 5 years ago

ianlewis commented 5 years ago

Found originally in minikube. Original bug: https://github.com/kubernetes/minikube/issues/3446

It looks like gvisor-containerd-shim contains a race condition where it runs runsc to exec a process in the sandbox and immediately tries to read the internal pid for the new container process, but many times runsc hasn't created the pid in time and gvisor-containerd-shim fails to read it.

Relevant error is in exec.go https://github.com/google/gvisor-containerd-shim/blob/ae2250b1dd07fcdb6a2212022b75d63d26966499/pkg/proc/exec.go#L215

Script to set up containerd environment:

sudo apt-get update
sudo apt-get -y install socat conntrack ipset

wget -q --show-progress --https-only --timestamping \
  https://github.com/kubernetes-sigs/cri-tools/releases/download/v1.13.0/crictl-v1.13.0-linux-amd64.tar.gz \
  https://storage.googleapis.com/gvisor/releases/nightly/2018-12-07/runsc \
  https://github.com/opencontainers/runc/releases/download/v1.0.0-rc5/runc.amd64 \
  https://github.com/containernetworking/plugins/releases/download/v0.6.0/cni-plugins-amd64-v0.6.0.tgz \
  https://github.com/containerd/containerd/releases/download/v1.2.0-beta.0/containerd-1.2.0-beta.0.linux-amd64.tar.gz \
  https://github.com/google/gvisor-containerd-shim/releases/download/v0.0.1-rc.0/gvisor-containerd-shim-v0.0.1-rc.0.linux-amd64

sudo mkdir -p \
  /etc/cni/net.d \
  /opt/cni/bin

sudo mv runsc runsc
sudo mv gvisor-containerd-shim-v0.0.1-rc.0.linux-amd64 gvisor-containerd-shim
sudo mv runc.amd64 runc
chmod +x runc runsc gvisor-containerd-shim
sudo mv runc runsc /usr/local/bin/
sudo tar -xvf crictl-v1.13.0-linux-amd64.tar.gz -C /usr/local/bin/
sudo tar -xvf cni-plugins-amd64-v0.6.0.tgz -C /opt/cni/bin/
sudo tar -xvf containerd-1.2.0-beta.0.linux-amd64.tar.gz -C /
sudo mv gvisor-containerd-shim /bin

sudo sh -c 'echo "runtime-endpoint: unix:///run/containerd/containerd.sock" > /etc/crictl.yaml'

cat <<EOF | sudo tee /etc/cni/net.d/10-bridge.conf
{
    "cniVersion": "0.3.1",
    "name": "bridge",
    "type": "bridge",
    "bridge": "cnio0",
    "isGateway": true,
    "ipMasq": true,
    "ipam": {
        "type": "host-local",
        "ranges": [
          [{"subnet": "10.200.0.0/24"}]
        ],
        "routes": [{"dst": "0.0.0.0/0"}]
    }
}
EOF

cat <<EOF | sudo tee /etc/cni/net.d/99-loopback.conf
{
    "cniVersion": "0.3.1",
    "type": "loopback"
}
EOF

sudo mkdir -p /etc/containerd/

cat << EOF | sudo tee /etc/containerd/config.toml
disabled_plugins = ["restart"]

[plugins]
  [plugins.linux]
    shim = "/bin/gvisor-containerd-shim"
    shim_debug = true
  [plugins.cri]
    [plugins.cri.containerd]
      snapshotter = "overlayfs"
      [plugins.cri.containerd.default_runtime]
        runtime_type = "io.containerd.runtime.v1.linux"
        runtime_engine = "/usr/local/bin/runc"
        runtime_root = ""
      [plugins.cri.containerd.untrusted_workload_runtime]
        runtime_type = "io.containerd.runtime.v1.linux"
        runtime_engine = "/usr/local/bin/runsc"
        runtime_root = "/run/containerd/runsc"
EOF

cat << EOF | sudo tee /etc/containerd/gvisor-containerd-shim.toml
# This is the path to the default runc containerd-shim.
runc_shim = "/bin/containerd-shim"
EOF

cat <<EOF | sudo tee /etc/systemd/system/containerd.service
[Unit]
Description=containerd container runtime
Documentation=https://containerd.io
After=network.target

[Service]
ExecStartPre=/sbin/modprobe overlay
ExecStart=/bin/containerd
Restart=always
RestartSec=5
Delegate=yes
KillMode=process
OOMScoreAdjust=-999
LimitNOFILE=1048576
LimitNPROC=infinity
LimitCORE=infinity

[Install]
WantedBy=multi-user.target
EOF

sudo systemctl daemon-reload
sudo systemctl enable containerd
sudo systemctl start containerd
ianlewis commented 5 years ago

I tried to repro this outside of minikube but execing into a gvisor sandbox container using crictl but it seems like crictl doesn't support sandbox annotations anymore? I can't get crictl to pass the untrusted-workload annotation to containerd.

cat <<EOF > sandbox.json
{
    "metadata": {
        "name": "nginx-sandbox",
        "namespace": "default",
        "attempt": 1,
        "uid": "hdishd83djaidwnduwk28bcsb",
        "annotations": {
          "io.kubernetes.cri.untrusted-workload": "true"
        }
    },
    "linux": {
    },
    "log_directory": "/tmp"
}
EOF

cat <<EOF > container.json
{
  "metadata": {
      "name": "nginx"
  },
  "image":{
      "image": "nginx"
  },
  "log_path":"nginx.0.log",
  "linux": {
  }
}
EOF

sudo crictl pull nginx
SANDBOX_ID=$(sudo crictl runp sandbox.json)
CONTAINER_ID=$(sudo crictl create ${SANDBOX_ID} container.json sandbox.json)
sudo crictl start ${CONTAINER_ID}

sudo crictl exec ${CONTAINER_ID} ls
sudo crictl exec -ti ${CONTAINER_ID} /bin/sh

Cleanup:

sudo crictl stop ${CONTAINER_ID}
sudo crictl rm ${CONTAINER_ID}
sudo crictl stopp ${SANDBOX_ID}
sudo crictl rmp ${SANDBOX_ID}
ianlewis commented 5 years ago

hrm. crictl doesn't seem to send the annotations in the pod create request.

ianlewis@containerd-test:~$ sudo crictl -D runp sandbox.json 
DEBU[0000] RunPodSandboxRequest: &RunPodSandboxRequest{Config:&PodSandboxConfig{Metadata:&PodSandboxMetadata{Name:nginx-sandbox,Uid:hdishd83djaidwnduwk28bcsb,Namespace:default,Attempt:1,},Hostname:,LogDirectory:/tmp,DnsConfig:nil,PortMappings:[],Labels:map[string]string{},Annotations:map[string]string{},Linux:&LinuxPodSandboxConfig{CgroupParent:,SecurityContext:nil,Sysctls:map[string]string{},},},RuntimeHandler:,} 
DEBU[0002] RunPodSandboxResponse: &RunPodSandboxResponse{PodSandboxId:791d4c30507c5219a50bbe0e75966aec9ec3f766639dcc1612a7e7f765d6d5d8,} 
791d4c30507c5219a50bbe0e75966aec9ec3f766639dcc1612a7e7f765d6d5d8
ianlewis@containerd-test:~$ sudo crictl -D create 791d4c30507c5219a50bbe0e75966aec9ec3f766639dcc1612a7e7f765d6d5d8 container.json sandbox.json 
DEBU[0000] CreateContainerRequest: &CreateContainerRequest{PodSandboxId:791d4c30507c5219a50bbe0e75966aec9ec3f766639dcc1612a7e7f765d6d5d8,Config:&ContainerConfig{Metadata:&ContainerMetadata{Name:nginx,Attempt:0,},Image:&ImageSpec{Image:nginx,},Command:[],Args:[],WorkingDir:,Envs:[],Mounts:[],Devices:[],Labels:map[string]string{},Annotations:map[string]string{},LogPath:nginx.0.log,Stdin:false,StdinOnce:false,Tty:false,Linux:&LinuxContainerConfig{Resources:nil,SecurityContext:nil,},Windows:nil,},SandboxConfig:&PodSandboxConfig{Metadata:&PodSandboxMetadata{Name:nginx-sandbox,Uid:hdishd83djaidwnduwk28bcsb,Namespace:default,Attempt:1,},Hostname:,LogDirectory:/tmp,DnsConfig:nil,PortMappings:[],Labels:map[string]string{},Annotations:map[string]string{},Linux:&LinuxPodSandboxConfig{CgroupParent:,SecurityContext:nil,Sysctls:map[string]string{},},},} 
DEBU[0000] CreateContainerResponse: &CreateContainerResponse{ContainerId:125dd5b7c95e0e6991a9c24aa848c536b0511c73d0b523e6b750ed615c244487,} 
125dd5b7c95e0e6991a9c24aa848c536b0511c73d0b523e6b750ed615c244487
ianlewis@containerd-test:~$ sudo crictl --version
crictl version v1.13.0
ianlewis commented 5 years ago

I realized that annotations isn't part of metadata

cat <<EOF > sandbox.json
{
    "metadata": {
        "name": "nginx-sandbox",
        "namespace": "default",
        "attempt": 1,
        "uid": "hdishd83djaidwnduwk28bcsb"
    },
    "annotations": {
      "io.kubernetes.cri.untrusted-workload": "true"
    },
    "linux": {
    },
    "log_directory": "/tmp"
}
EOF
Random-Liu commented 5 years ago

This should have been fixed by https://github.com/google/gvisor/commit/4cd4b60352bc8a572a0a9482c58564397c49446c.

Does the runsc version in minikube include that fix?

ianlewis commented 5 years ago

@Random-Liu minikube is using the 2018-12-07 version of runsc so I think that change should be included.

https://github.com/kubernetes/minikube/blob/master/pkg/minikube/constants/constants.go#L285

ianlewis commented 5 years ago

@Random-Liu Also, the fix you are pointing to is for the pid file, but the minikube issue is for the internal pid file.

ianlewis commented 5 years ago

I can get the containers to launch in crictl but I can't repro the issue with exec. It seems to work properly when I use crictl

Random-Liu commented 5 years ago

@Random-Liu Also, the fix you are pointing to is for the pid file, but the minikube issue is for the internal pid file.

It is for the internal pid file. It moves the pid file generation to the end, which makes sure internal pid file is generated before the pid file, and because runsc exec has a loop to wait for pid file generation, so the internal pid file is guaranteedto be generated before runsc exec returns.

ianlewis commented 5 years ago

@Random-Liu Hmm, was the fix not in the 2018-12-07 version of runsc then?

I'll try the latest version of runsc in minikube to see if it fixes the issue but this CL should be included AFAICT

ianlewis commented 5 years ago

@Random-Liu Ok. I don't totally understand your fix, but I tried using the latest version and I can't repo any more so it looks like you're right. I was deceived by the minikube code https://github.com/kubernetes/minikube/blob/master/pkg/minikube/constants/constants.go#L285. The minikube gvisor addon doesn't seem to be using the 2018-12-07 version of runsc. The docker image that they use references an older version.

$ for m in $(seq -f "%02g" 1 12); do
>     for d in $(seq -f "%02g" 1 31); do
>         if grep "https://storage.googleapis.com/gvisor/releases/nightly/2018-${m}-${d}/runsc" gvisor-addon; then
>             echo "2018-${m}-${d}";
>         fi;
>     done;
> done;

Binary file gvisor-addon matches
2018-11-08