Open mmerkes opened 7 hours ago
/triage accepted
cc @ConnorJC3 @torredil
Not sure if they're related to each other, but also see this error in kubelet:
Nov 25 18:34:03 ip-172-31-24-156 kubelet[6298]: E1125 18:34:03.425509 6298 pod_workers.go:1301] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"aws-cloud-controller-manager\" with ImagePullBackOff: \"Back-off pulling image \\\"209411653980.dkr.ecr.us-east-1.amazonaws.com/provider-aws/cloud-controller-manager:v1.30.0-beta.0-110-gac63fea\\\": ErrImagePull: rpc error: code = NotFound desc = failed to pull and unpack image \\\"209411653980.dkr.ecr.us-east-1.amazonaws.com/provider-aws/cloud-controller-manager:v1.30.0-beta.0-110-gac63fea\\\": failed to resolve reference \\\"209411653980.dkr.ecr.us-east-1.amazonaws.com/provider-aws/cloud-controller-manager:v1.30.0-beta.0-110-gac63fea\\\": 209411653980.dkr.ecr.us-east-1.amazonaws.com/provider-aws/cloud-controller-manager:v1.30.0-beta.0-110-gac63fea: not found\"" pod="kube-system/aws-cloud-controller-manager-cq6m2" podUID="b6d43d27-1967-414e-86f8-72b3e9375664"
Not sure if they're related to each other, but also see this error in kubelet:
Very likely related - as I believe it is the AWS CCM that adds the labels we rely on for metadata to the node.
Very likely related - as I believe it is the AWS CCM that adds the labels we rely on for metadata to the node.
Sounds right. Looks like that's a red herring.
Which jobs are failing:
Which test(s) are failing: BeforeSuite is failing because CPI nodes aren't stabilizing.
Since when has it been failing: This one passed on 10/31.
This one failed on 11/6. So sometime between these two.
Testgrid link:
Reason for failure:
EBS CSI pod is not stabilizing:
Anything else we need to know:
/kind failing-test