kubernetes / cloud-provider-aws

Cloud provider for AWS
https://cloud-provider-aws.sigs.k8s.io/
Apache License 2.0
395 stars 302 forks source link

aws-cloud-provider(version 1.27.1) always crash #746

Closed datavisorhenryzhao closed 6 months ago

datavisorhenryzhao commented 11 months ago

What happened: k8s cluster: 1.27.6

master node: kubeadm_config.yaml, and run kubeadm join

apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
containerLogMaxSize: "200Mi"
containerLogMaxFiles: 3
imageGCHighThresholdPercent: 80
imageGCLowThresholdPercent: 75
imageMinimumGCAge: "5m30s"
providerID: "aws"
evictionHard:
    memory.available:  "200Mi"
    imagefs.available: "15%"

worker node: run kubeadm join

cluster info

#kubectl get node 
NAME                            STATUS   ROLES           AGE   VERSION
ip-10-142-23-229.ec2.internal   Ready    <none>          36m   v1.27.6
ip-10-142-39-245.ec2.internal   Ready    control-plane   30h   v1.27.6
ip-10-142-42-164.ec2.internal   Ready    control-plane   30h   v1.27.6
ip-10-142-61-198.ec2.internal   Ready    control-plane   30h   v1.27.6

#kubectl get node ip-10-142-23-229.ec2.internal -o yaml
apiVersion: v1
kind: Node
metadata:
  annotations:
   ...
spec:
  podCIDR: 192.168.8.0/24
  podCIDRs:
  - 192.168.8.0/24
  providerID: aws

aws cloud controller crash:

1115 07:06:42.124716       1 aws.go:861] Setting up informers for Cloud
W1115 07:06:42.124764       1 controllermanager.go:313] "tagging" is disabled
I1115 07:06:42.124773       1 controllermanager.go:317] Starting "cloud-node"
I1115 07:06:42.128849       1 controllermanager.go:336] Started "cloud-node"
I1115 07:06:42.131324       1 controllermanager.go:317] Starting "cloud-node-lifecycle"
I1115 07:06:42.128909       1 node_controller.go:161] Sending events to api server.
I1115 07:06:42.131591       1 node_controller.go:170] Waiting for informer caches to sync
I1115 07:06:42.131945       1 controllermanager.go:336] Started "cloud-node-lifecycle"
I1115 07:06:42.131964       1 controllermanager.go:317] Starting "service"
I1115 07:06:42.132052       1 node_lifecycle_controller.go:113] Sending events to api server
I1115 07:06:42.133178       1 controllermanager.go:336] Started "service"
I1115 07:06:42.133400       1 controllermanager.go:317] Starting "route"
I1115 07:06:42.133409       1 core.go:104] Will not configure cloud provider routes, --configure-cloud-routes: false
W1115 07:06:42.133418       1 controllermanager.go:324] Skipping "route"
I1115 07:06:42.133728       1 controller.go:229] Starting service controller
I1115 07:06:42.133802       1 shared_informer.go:311] Waiting for caches to sync for service
E1115 07:06:42.142644       1 runtime.go:79] Observed a panic: &errors.errorString{s:"unable to calculate an index entry for key \"ip-10-142-23-229.ec2.internal\" on index \"instanceID\": error mapping node \"ip-10-142-23-229.ec2.internal\"'s provider ID \"aws\" to instance ID: Invalid format for AWS instance (aws)"} (unable to calculate an index entry for key "ip-10-142-23-229.ec2.internal" on index "instanceID": error mapping node "ip-10-142-23-229.ec2.internal"'s provider ID "aws" to instance ID: Invalid format for AWS instance (aws))

What you expected to happen: aws-cloud-provider should not crash. How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Environment:

/kind bug

k8s-ci-robot commented 11 months ago

This issue is currently awaiting triage.

If cloud-provider-aws contributors determine this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
datavisorhenryzhao commented 11 months ago

The node.spec.provider is "aws". But aws-cloud-provider expected 'providerID: aws:///us-east-1a/i-xxxx'

datavisorhenryzhao commented 11 months ago

I find when i start kubelet with "--cloud-provider=external" on master and worker nodes, the node.spec.providerId looks like "aws:///region/instnaceid". And aws cloud controller will not crash

mmerkes commented 10 months ago

@datavisorhenryzhao Are you still seeing this issue?

kmala commented 9 months ago

the crash issue has been fixed in https://github.com/kubernetes/cloud-provider-aws/pull/605. i will work in backporting the fix to older versions

k8s-triage-robot commented 6 months ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

cartermckinnon commented 6 months ago

This is resolved across all the active release branches.