Closed leonardpahlke closed 3 years ago
@leonardpahlke: This issue is currently awaiting triage.
If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted
label and provide further guidance.
The triage/accepted
label can be added by org members by writing /triage accepted
in a comment.
@amwat @cheftako I didn't dig much into this failures, but 3/3 failures I saw are because the konnectivity agent fails to be scheduled due to cpu constraints, is it the environment or the konnectivity agent? or both?
_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/e2e.go:74
Sep 15 23:47:02.903: Error waiting for all pods to be running and ready: 1 / 31 pods in namespace "kube-system" are NOT in RUNNING and READY state in 10m0s
POD NODE PHASE GRACE CONDITIONS
konnectivity-agent-l5xdt Pending [{Type:PodScheduled Status:False LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2021-09-15 23:36:24 +0000 UTC Reason:Unschedulable Message:0/4 nodes are available: 1 Insufficient cpu, 3 node(s) didn't match Pod's node affinity/selector.}]
_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/e2e.go:77
/remove sig-network /remove-sig network
/remove-sig network
I'm not sure if the environment changed recently. looking at the commit range when it started to fail
https://github.com/kubernetes/kubernetes/compare/4c014e5ca...1c1d2e4ed https://github.com/kubernetes/kubernetes/pull/102592 seems suspect
cc @pacoxu @cheftako
Sorry for that. Let me check.
konnectivity-agent
. So it may be scheduled to a node with NoExcute
taint node._output/dockerized/go/src/k8s.io/kubernetes/test/e2e/e2e.go:74 Sep 15 23:47:02.903: Error waiting for all pods to be running and ready: 1 / 31 pods in namespace "kube-system" are NOT in RUNNING and READY state in 10m0s POD NODE PHASE GRACE CONDITIONS konnectivity-agent-l5xdt Pending [{Type:PodScheduled Status:False LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2021-09-15 23:36:24 +0000 UTC Reason:Unschedulable Message:0/4 nodes are available: 1 Insufficient cpu, 3 node(s) didn't match Pod's node affinity/selector.}] _output/dockerized/go/src/k8s.io/kubernetes/test/e2e/e2e.go:77
The master node is with NoSchedule taint.
"taints": [
{
"key": "node-role.kubernetes.io/master",
"effect": "NoSchedule"
},
{
"key": "node.kubernetes.io/unschedulable",
"effect": "NoSchedule"
}
]
Should we remove the toleration for NoSchedule?
I follow up kube-proxy and node-local-dns DaemonSet toleration settings and add both NoExecute and NoSchedule.
I opened #105084 to remove the NoSchedule toleration.
The CI is green now.
Which jobs are failing:
Conformance - GCE- master - kubetest2
Which test(s) are failing:
Since when has it been failing:
15.09.2021 04:05 PDT
Testgrid link:
TestGrid link Failed Job link (one of them)
Reason for failure:
Kubernetes e2e suite: BeforeSuite:
kubetest2: Test
Build log
/sig scheduling /sig network