lyft / cni-ipvlan-vpc-k8s

AWS VPC Kubernetes CNI driver using IPvlan
Apache License 2.0
360 stars 58 forks source link

Pods stucked in ContainerCreating (possible duplicate IP assignment from IPAM) #9

Closed liwenwu-amazon closed 6 years ago

liwenwu-amazon commented 6 years ago

I am seeing some pods stuck in ContainerCreating state when I run

kubectl create -f https://k8s.io/docs/tasks/access-application-cluster/hello.yaml

Here is the output

kubectl get pod
NAME                     READY     STATUS              RESTARTS   AGE
hello-1243552595-1bm63   1/1       Running             0          3h
hello-1243552595-bvc3r   1/1       Running             0          3h
hello-1243552595-hrj3s   0/1       ContainerCreating   0          3h
hello-1243552595-mv49s   0/1       ContainerCreating   0          3h

I am using version 1.7.10

kubectl version
Client Version: version.Info{Major:"1", Minor:"6", GitVersion:"v1.6.4", GitCommit:"d6f433224538d4f9ca2f7ae19b252e6fcb66a3ae", GitTreeState:"clean", BuildDate:"2017-05-19T18:44:27Z", GoVersion:"go1.7.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"7", GitVersion:"v1.7.10", GitCommit:"bebdeb749f1fa3da9e1312c4b08e439c404b3136", GitTreeState:"clean", BuildDate:"2017-11-03T16:31:49Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}

Here is the error msg I am seeing in the log

Dec 13 00:26:17 ip-10-0-55-131 kubelet[3752]: E1213 00:26:17.380612    3752 remote_runtime.go:91] RunPodSandbox from runtime service failed: rpc error: code = 2 desc = NetworkPlugin cni failed to set up pod "hello-1243552595-mv49s_default" network: failed to add host route dst 10.0.5.23: file exists
Dec 13 00:26:17 ip-10-0-55-131 kubelet[3752]: E1213 00:26:17.380659    3752 kuberuntime_sandbox.go:54] CreatePodSandbox for pod "hello-1243552595-mv49s_default(28da1900-df81-11e7-a409-0e64e3f014fa)" failed: rpc error: code = 2 desc = NetworkPlugin cni failed to set up pod "hello-1243552595-mv49s_default" network: failed to add host route dst 10.0.5.23: file exists
Dec 13 00:26:17 ip-10-0-55-131 kubelet[3752]: E1213 00:26:17.380678    3752 kuberuntime_manager.go:624] createPodSandbox for pod "hello-1243552595-mv49s_default(28da1900-df81-11e7-a409-0e64e3f014fa)" failed: rpc error: code = 2 desc = NetworkPlugin cni failed to set up pod "hello-1243552595-mv49s_default" network: failed to add host route dst 10.0.5.23: file exists
Dec 13 00:26:17 ip-10-0-55-131 kubelet[3752]: E1213 00:26:17.380711    3752 pod_workers.go:182] Error syncing pod 28da1900-df81-11e7-a409-0e64e3f014fa ("hello-1243552595-mv49s_default(28da1900-df81-11e7-a409-0e64e3f014fa)"), skipping: failed to "CreatePodSandbox" for "hello-1243552595-mv49s_default(28da1900-df81-11e7-a409-0e64e3f014fa)" with CreatePodSandboxError: "CreatePodSandbox for pod \"hello-1243552595-mv49s_default(28da1900-df81-11e7-a409-0e64e3f014fa)\" failed: rpc error: code = 2 desc = NetworkPlugin cni failed to set up pod \"hello-1243552595-mv49s_default\" network: failed to add host route dst 10.0.5.23: file exists"

Looks like cni-plugin tries to use 10.0.5.23 for Pod hello-1243552595-mv49s, whereas 10.0.5.23 is already assigned to another Pod running on the same host.

kubectl describe pod hello-1243552595-bvc3r
Name:           hello-1243552595-bvc3r
Namespace:      default
Node:           ip-10-0-55-131.ec2.internal/10.0.55.131
Start Time:     Tue, 12 Dec 2017 21:12:28 +0000
Labels:         app=hello
                pod-template-hash=1243552595
                tier=backend
                track=stable
Annotations:    kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicaSet","namespace":"default","name":"hello-1243552595","uid":"28d91673-df81-11e7-a409-0e64e3f014fa","...
                kubernetes.io/limit-ranger=LimitRanger plugin set: cpu request for container hello
Status:         Running
IP:             10.0.5.23
Controllers:    ReplicaSet/hello-1243552595
Containers:
  hello:
    Container ID:       docker://34c740bf48525b72a7fedceb1dc64951f4ca14bd9fad6f3b2b8e485f09f8c152
    Image:              gcr.io/google-samples/hello-go-gke:1.0
    Image ID:           docker-pullable://gcr.io/google-samples/hello-go-gke@sha256:4ea9cd3d35f81fc91bdebca3fae50c180a1048be0613ad0f811595365040396e
    Port:               80/TCP
    State:              Running
      Started:          Tue, 12 Dec 2017 23:37:23 +0000
    Ready:              True
    Restart Count:      0
    Requests:
      cpu:              100m
    Environment:        <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-j8llx (ro)
Conditions:
  Type          Status
  Initialized   True
  Ready         True
  PodScheduled  True
Volumes:
  default-token-j8llx:
    Type:       Secret (a volume populated by a Secret)
    SecretName: default-token-j8llx
    Optional:   false
QoS Class:      Burstable
Node-Selectors: <none>
Tolerations:    node.alpha.kubernetes.io/notReady=:Exists:NoExecute for 300s
                node.alpha.kubernetes.io/unreachable=:Exists:NoExecute for 300s
Events:
  FirstSeen     LastSeen        Count   From                                    SubObjectPath           Type            Reason          Message
  ---------     --------        -----   ----                                    -------------           --------        ------          -------
  2h            18m             657     kubelet, ip-10-0-55-131.ec2.internal                            Warning         FailedSync      Error syncing pod
  2h            18m             657     kubelet, ip-10-0-55-131.ec2.internal                            Normal          SandboxChanged  Pod sandbox changed, it will be killed and re-created.
  18m           18m             1       kubelet, ip-10-0-55-131.ec2.internal    spec.containers{hello}  Normal          Pulling         pulling image "gcr.io/google-samples/hello-go-gke:1.0"
  18m           18m             1       kubelet, ip-10-0-55-131.ec2.internal    spec.containers{hello}  Normal          Pulled          Successfully pulled image "gcr.io/google-samples/hello-go-gke:1.0"
  18m           18m             1       kubelet, ip-10-0-55-131.ec2.internal    spec.containers{hello}  Normal          Created         Created container
  18m           18m             1       kubelet, ip-10-0-55-131.ec2.internal    spec.containers{hello}  Normal          Started         Started container

Here is the ip route output

ip route
default via 10.0.32.1 dev eth0
10.0.5.0/24 dev eth1  proto kernel  scope link  src 10.0.5.154
10.0.5.23 dev veth350a598b  scope link
10.0.32.0/19 dev eth0  proto kernel  scope link  src 10.0.55.131
172.17.0.0/16 dev docker0  proto kernel  scope link  src 172.17.0.1

Here is the docker ps output

docker ps
CONTAINER ID        IMAGE                                                                                                        COMMAND                  CREATED             STATUS              PORTS               NAMES
efd4c399f948        gcr.io/google_containers/pause-amd64:3.0                                                                     "/pause"                 6 seconds ago       Up 5 seconds                            k8s_POD_kube-dns-2712020956-90fd9_kube-system_ebba507c-df81-11e7-a409-0e64e3f014fa_111
23cac500ae36        gcr.io/google_containers/pause-amd64:3.0                                                                     "/pause"                 6 seconds ago       Up 5 seconds                            k8s_POD_hello-1243552595-mv49s_default_28da1900-df81-11e7-a409-0e64e3f014fa_111
52532863c1d4        gcr.io/google_containers/pause-amd64:3.0                                                                     "/pause"                 7 seconds ago       Up 6 seconds                            k8s_POD_hello-1243552595-hrj3s_default_28da0c5e-df81-11e7-a409-0e64e3f014fa_111
34c740bf4852        gcr.io/google-samples/hello-go-gke@sha256:4ea9cd3d35f81fc91bdebca3fae50c180a1048be0613ad0f811595365040396e   "/usr/bin/hello"         20 minutes ago      Up 20 minutes                           k8s_hello_hello-1243552595-bvc3r_default_28da0787-df81-11e7-a409-0e64e3f014fa_0
1c17414c1e46        gcr.io/google_containers/pause-amd64:3.0                                                                     "/pause"                 20 minutes ago      Up 20 minutes                           k8s_POD_hello-1243552595-bvc3r_default_28da0787-df81-11e7-a409-0e64e3f014fa_1
3ad87d21eea0        protokube:1.6.0                                                                                              "/usr/bin/protokube -"   3 hours ago         Up 3 hours                              distracted_aryabhata
theatrus commented 6 years ago

Could you grab the output of

sudo cni-ipvlan-vpc-k8s-tool eniif and sudo cni-ipvlan-vpc-k8s-tool free-ips ? I suspect we are not detecting the in-use IP correctly on this host configuration (this is a thorny issue with docker as it uses un-named network namespaces - we tested one configuration of K8S 1.8.3 + Fedora packaged Docker, but differences are likely to sprout up)

liwenwu-amazon commented 6 years ago
root@ip-10-0-55-131:/home/admin# ./cni-ipvlan-vpc-k8s-tool eniif
iface   mac                 id             subnet            subnet_cidr    secgrps         vpc            ips                      
eth0    0e:94:8f:aa:19:c4   eni-4247fbca   subnet-3bf05866   10.0.32.0/19   [sg-34b79141]   vpc-0066bd79   [10.0.55.131]            
eth1    0e:e6:73:13:66:ac   eni-e1dc6569   subnet-c2e8419f   10.0.5.0/24    [sg-34b79141]   vpc-0066bd79   [10.0.5.154 10.0.5.23]

and

root@ip-10-0-55-131:/home/admin# ./cni-ipvlan-vpc-k8s-tool free-ips
Couldn't enumerate named namespaces
adapter   ip          
eth1      10.0.5.23 
liwenwu-amazon commented 6 years ago

Here is the image I am using

kops get instancegroups nodes -oyaml
Using cluster from kubectl context: lyft-dec12.k8s-test.com

apiVersion: kops/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: 2017-12-12T20:40:40Z
  labels:
    kops.k8s.io/cluster: lyft-dec12.k8s-test.com
  name: nodes
spec:
  image: kope.io/k8s-1.7-debian-jessie-amd64-hvm-ebs-2017-12-02
  machineType: t2.medium
  maxSize: 3
  minSize: 3
  role: Node
  subnets:
  - us-east-1a
theatrus commented 6 years ago

We’re looking at this. Suspect it’s a race condition with the free-ip finder that triggers under Docker: if there is no running container on docker at the time the next queued IPAM job runs, or at least according to the Docker API, we wouldn’t detect that network namespace existing and re-issue the IP.

theatrus commented 6 years ago

I normalized the logic used in #11 to the dockershim code's Docker namespace as found in Kubernetes its self - this may help the situation.

On my test system, I can scale up and down 50 busybox pods on a single kubelet without issues. Specifics:

 API version:  1.32
 Go version:   go1.8.3
 Git commit:   19e2cf6
 Built:        Thu Dec  7 22:25:06 2017
 OS/Arch:      linux/amd64

Server:
 Version:      17.09.1-ce
 API version:  1.32 (minimum version 1.12)
 Go version:   go1.8.3
 Git commit:   19e2cf6
 Built:        Thu Dec  7 22:26:29 2017
 OS/Arch:      linux/amd64
 Experimental: false

(as packaged in Fedora 26)

Kernel: 4.14.4-200.fc26.x86_64

Kubernetes v1.8.3

Example scaling stress test:

default       busybox-6986c7c9c7-2rdxl               1/1       Running   0          4m        172.31.157.8     ip-172-31-37-168.ec2.internal
default       busybox-6986c7c9c7-2zjz6               1/1       Running   0          4m        172.31.149.202   ip-172-31-37-168.ec2.internal
default       busybox-6986c7c9c7-4ghms               1/1       Running   0          4m        172.31.145.1     ip-172-31-37-168.ec2.internal
default       busybox-6986c7c9c7-5gxql               1/1       Running   0          4m        172.31.202.50    ip-172-31-37-168.ec2.internal
default       busybox-6986c7c9c7-5s6cx               1/1       Running   0          4m        172.31.144.162   ip-172-31-37-168.ec2.internal
default       busybox-6986c7c9c7-6d8dd               1/1       Running   0          4m        172.31.205.104   ip-172-31-37-168.ec2.internal
default       busybox-6986c7c9c7-6jl9b               1/1       Running   0          46s       172.31.157.69    ip-172-31-37-168.ec2.internal
default       busybox-6986c7c9c7-7cr9p               1/1       Running   0          4m        172.31.150.40    ip-172-31-37-168.ec2.internal
default       busybox-6986c7c9c7-7nrlj               1/1       Running   0          4m        172.31.196.175   ip-172-31-37-168.ec2.internal
default       busybox-6986c7c9c7-7t8h4               1/1       Running   0          46s       172.31.163.33    ip-172-31-37-168.ec2.internal
default       busybox-6986c7c9c7-9sc5j               1/1       Running   0          4m        172.31.202.26    ip-172-31-37-168.ec2.internal
default       busybox-6986c7c9c7-b82nj               1/1       Running   0          46s       172.31.166.37    ip-172-31-37-168.ec2.internal
default       busybox-6986c7c9c7-bp6zl               1/1       Running   0          4m        172.31.147.89    ip-172-31-37-168.ec2.internal
default       busybox-6986c7c9c7-btxqs               1/1       Running   0          4m        172.31.193.235   ip-172-31-37-168.ec2.internal
default       busybox-6986c7c9c7-cktdg               1/1       Running   0          46s       172.31.170.74    ip-172-31-37-168.ec2.internal
default       busybox-6986c7c9c7-fxb2n               1/1       Running   0          46s       172.31.148.179   ip-172-31-37-168.ec2.internal
default       busybox-6986c7c9c7-gmzz8               1/1       Running   0          4m        172.31.200.84    ip-172-31-37-168.ec2.internal
default       busybox-6986c7c9c7-hclrk               1/1       Running   0          46s       172.31.174.36    ip-172-31-37-168.ec2.internal
default       busybox-6986c7c9c7-hcnz6               1/1       Running   0          46s       172.31.157.208   ip-172-31-37-168.ec2.internal
default       busybox-6986c7c9c7-hjq4k               1/1       Running   0          46s       172.31.169.125   ip-172-31-37-168.ec2.internal
default       busybox-6986c7c9c7-hpwp4               1/1       Running   0          4m        172.31.200.213   ip-172-31-37-168.ec2.internal
default       busybox-6986c7c9c7-kdz8b               1/1       Running   0          4m        172.31.147.248   ip-172-31-37-168.ec2.internal
default       busybox-6986c7c9c7-l6vns               1/1       Running   0          4m        172.31.146.115   ip-172-31-37-168.ec2.internal
default       busybox-6986c7c9c7-mjqd4               1/1       Running   0          4m        172.31.147.249   ip-172-31-37-168.ec2.internal
default       busybox-6986c7c9c7-ms48p               1/1       Running   0          46s       172.31.154.38    ip-172-31-37-168.ec2.internal
default       busybox-6986c7c9c7-mt4th               1/1       Running   0          4m        172.31.146.254   ip-172-31-37-168.ec2.internal
default       busybox-6986c7c9c7-nzzxw               1/1       Running   0          4m        172.31.193.242   ip-172-31-37-168.ec2.internal
default       busybox-6986c7c9c7-rzwjc               1/1       Running   0          4m        172.31.194.78    ip-172-31-37-168.ec2.internal
default       busybox-6986c7c9c7-shsgk               1/1       Running   0          46s       172.31.158.253   ip-172-31-37-168.ec2.internal
default       busybox-6986c7c9c7-tphmd               1/1       Running   0          4m        172.31.207.90    ip-172-31-37-168.ec2.internal
kube-system   cluster-autoscaler-b48f465cf-n9fm2     1/1       Running   0          39m       172.31.41.154    ip-172-31-41-154.ec2.internal
kube-system   cluster-autoscaler-b48f465cf-q8m27     1/1       Running   0          39m       172.31.34.190    ip-172-31-34-190.ec2.internal
kube-system   heapster-69c44d5864-sb4h9              1/1       Running   0          13m       172.31.198.230   ip-172-31-37-168.ec2.internal
kube-system   kube-dns-7797cb8758-56tng              3/3       Running   0          13m       172.31.194.45    ip-172-31-37-168.ec2.internal
kube-system   kube-dns-7797cb8758-ccz8z              3/3       Running   0          13m       172.31.192.181   ip-172-31-37-168.ec2.internal
kube-system   kube-dns-autoscaler-7db47cb9b7-6zp29   1/1       Running   0          13m       172.31.203.152   ip-172-31-37-168.ec2.internal
kube-system   kube2iam-ts8tf                         1/1       Running   0          22m       172.31.37.168    ip-172-31-37-168.ec2.internal
kube-system   kubernetes-dashboard-747c4f7cf-9vp5c   1/1       Running   0          13m       172.31.197.209   ip-172-31-37-168.ec2.internal
kube-system   npd-v0.4.1-c65sr                       1/1       Running   0          16m       172.31.37.168    ip-172-31-37-168.ec2.internal
kube-system   rescheduler-6df54645b7-zbvvt           1/1       Running   0          13m       172.31.37.168    ip-172-31-37-168.ec2.internal

I have not had a chance to re-create this using Ubuntu and/or Kubernetes 1.7.x. Will give that a go next.

liwenwu-amazon commented 6 years ago

Does it mean that this CNI plugin will NOT work with other Container Runtime except Docker? thanks

theatrus commented 6 years ago

We use two paths

If no Docker daemon is detected we simply skip populating the docker data. I'm still doing more testing on my change :)

theatrus commented 6 years ago

Had a chance to run #11 through a mixed CRIO/Docker cluster using conformance tests on 1.8.5 and didn't notice any leakage, in addition to some stress testing by hand. I still haven't had a chance to test on 1.7 or Ubuntu kernels/docker distributions. However, #11 is a lot more resilient to how namespaces are handled and matches the logic inside Kubernetes so I expect this problem to be mitigated. We'll cut a release tonight or tomorrow.