vmware-archive / kops

Kubernetes Operations (kops) - Production Grade K8s Installation, Upgrades, and Management
Apache License 2.0
3 stars 3 forks source link

pod scheduling failed with - failed to fit in any node - fit failure summary on nodes : MatchNodeSelector (2), PodToleratesNodeTaints (1) #69

Open divyenpatel opened 7 years ago

divyenpatel commented 7 years ago

pod creation with MatchNodeSelector is failing with "FailedScheduling pod (vsphere-e2e-j11sb) failed to fit in any node - fit failure summary on nodes : MatchNodeSelector (2), PodToleratesNodeTaints (1)"

Verified label - vsphere_e2e_label=vsphere_e2e_8b21b01d-2c58-11e7-ae94-0242ac110002 is set on the master and node1

5 tests failed from this spec - https://github.com/kubernetes/kubernetes/blob/master/test/e2e/storage/vsphere_volume_placement.go

pod spec:

oot@master-vmware-zone:~# kubectl --namespace=e2e-tests-volume-placement-f378m get pods -o json
{
    "apiVersion": "v1",
    "items": [
        {
            "apiVersion": "v1",
            "kind": "Pod",
            "metadata": {
                "creationTimestamp": "2017-04-28T21:26:41Z",
                "generateName": "vsphere-e2e-",
                "name": "vsphere-e2e-j11sb",
                "namespace": "e2e-tests-volume-placement-f378m",
                "resourceVersion": "70170",
                "selfLink": "/api/v1/namespaces/e2e-tests-volume-placement-f378m/pods/vsphere-e2e-j11sb",
                "uid": "5ee06e44-2c59-11e7-a375-0050569acc63"
            },
            "spec": {
                "containers": [
                    {
                        "command": [
                            "/bin/sh",
                            "-c",
                            "while true; do sleep 2; done"
                        ],
                        "image": "gcr.io/google_containers/busybox:1.24",
                        "imagePullPolicy": "IfNotPresent",
                        "name": "vsphere-e2e-container-4d83d4fa-2c59-11e7-ae94-0242ac110002",
                        "resources": {},
                        "terminationMessagePath": "/dev/termination-log",
                        "volumeMounts": [
                            {
                                "mountPath": "/mnt/volume1",
                                "name": "volume1"
                            },
                            {
                                "mountPath": "/var/run/secrets/kubernetes.io/serviceaccount",
                                "name": "default-token-fxnfp",
                                "readOnly": true
                            }
                        ]
                    }
                ],
                "dnsPolicy": "ClusterFirst",
                "nodeSelector": {
                    "vsphere_e2e_label": "vsphere_e2e_8b21b01d-2c58-11e7-ae94-0242ac110002"
                },
                "restartPolicy": "Never",
                "securityContext": {},
                "serviceAccount": "default",
                "serviceAccountName": "default",
                "terminationGracePeriodSeconds": 30,
                "volumes": [
                    {
                        "name": "volume1",
                        "vsphereVolume": {
                            "fsType": "ext4",
                            "volumePath": "[vsanDatastore] kubevols/e2e-vmdk-1493414770783055596.vmdk"
                        }
                    },
                    {
                        "name": "default-token-fxnfp",
                        "secret": {
                            "defaultMode": 420,
                            "secretName": "default-token-fxnfp"
                        }
                    }
                ]
            },
            "status": {
                "conditions": [
                    {
                        "lastProbeTime": null,
                        "lastTransitionTime": "2017-04-28T21:26:41Z",
                        "reason": "Unschedulable",
                        "status": "False",
                        "type": "PodScheduled"
                    }
                ],
                "phase": "Pending"
            }
        }
    ],
    "kind": "List",
    "metadata": {},
    "resourceVersion": "",
    "selfLink": ""
}
oot@master-vmware-zone:~# kubectl --namespace=e2e-tests-volume-placement-f378m describe pod vsphere-e2e-j11sb
Name:   vsphere-e2e-j11sb
Namespace:  e2e-tests-volume-placement-f378m
Node:   /
Labels:   <none>
Status:   Pending
IP:   
Controllers:  <none>
Containers:
  vsphere-e2e-container-4d83d4fa-2c59-11e7-ae94-0242ac110002:
    Image:  gcr.io/google_containers/busybox:1.24
    Port: 
    Command:
      /bin/sh
      -c
      while true; do sleep 2; done
    Volume Mounts:
      /mnt/volume1 from volume1 (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-fxnfp (ro)
    Environment Variables:  <none>
Conditions:
  Type    Status
  PodScheduled  False 
Volumes:
  volume1:
    Type: vSphereVolume (a Persistent Disk resource in vSphere)
    VolumePath: [vsanDatastore] kubevols/e2e-vmdk-1493414770783055596.vmdk
    FSType: ext4
  default-token-fxnfp:
    Type: Secret (a volume populated by a Secret)
    SecretName: default-token-fxnfp
QoS Class:  BestEffort
Tolerations:  <none>
Events:
  FirstSeen LastSeen  Count From      SubObjectPath Type    Reason      Message
  --------- --------  ----- ----      ------------- --------  ------      -------
  1m    31s   7 {default-scheduler }      Warning   FailedScheduling  pod (vsphere-e2e-j11sb) failed to fit in any node
fit failure summary on nodes : MatchNodeSelector (2), PodToleratesNodeTaints (1)
root@master-vmware-zone:~# 
Name:           master-vmware-zone.masters.kubernetes.skydns.local1
Role:           master
Labels:         beta.kubernetes.io/arch=amd64
            beta.kubernetes.io/os=linux
            failure-domain.beta.kubernetes.io/region=vcqaDC
            failure-domain.beta.kubernetes.io/zone=cluster-vsan-1
            kubernetes.io/hostname=master-vmware-zone
            kubernetes.io/role=master
            node-role.kubernetes.io/master=
            vsphere_e2e_label=vsphere_e2e_8b21b01d-2c58-11e7-ae94-0242ac110002
Taints:         dedicated=master:NoSchedule
Name:           nodes.kubernetes.skydns.local1
Role:           node
Labels:         beta.kubernetes.io/arch=amd64
            beta.kubernetes.io/os=linux
            failure-domain.beta.kubernetes.io/region=vcqaDC
            failure-domain.beta.kubernetes.io/zone=cluster-vsan-1
            kubernetes.io/hostname=nodes
            kubernetes.io/role=node
            node-role.kubernetes.io/node=
            vsphere_e2e_label=vsphere_e2e_8b318643-2c58-11e7-ae94-0242ac110002
Taints:         <none>
prashima commented 7 years ago

This problem is happening because kops attaches certain taint with master and corresponding toleration is not present in the pod spec. At the same time test pick two ready nodes and attaches the 'vsphere_e2e_label=vsphere_e2e_8b21b01d-2c58-11e7-ae94-0242ac110002' label to those nodes.

Now we have conflicting situation- 1) Pod spec doesn't contain appropriate toleration. 2) Master node, by the virtue of having vsphere_e2e_label=vsphere_e2e_8b21b01d-2c58-11e7-ae94-0242ac110002 label attached to it, is one of the candidate nodes for the pod to be scheduled on.

One way to solve this is on the CI/CD side, so that we don't have to modify the tests that are not part of the kops repo, but the kubernetes repo. CI/CD work flow can explicitly disable scheduling on master and then proceed with labeling the node and pod creation test.

fabulous-gopher commented 7 years ago

This issue was moved to kubernetes/kops#2730