kubernetes / cloud-provider-aws

Cloud provider for AWS
https://cloud-provider-aws.sigs.k8s.io/
Apache License 2.0
374 stars 299 forks source link

Support Region for DescribeInstance Call #939

Open atsai1220 opened 1 month ago

atsai1220 commented 1 month ago

What would you like to be added:

Why is this needed:

Questions:

Findings: Currently nodes from another region are "not found" by node-life-cycle controller and will be promptly deleted from the cluster after joining.

On 1.30.1 logs

node_controller.go:240] error syncing 'ip-10-117-161-37.ap-southeast-2.compute.internal': failed to get instance metadata for node ip-10-117-161-37.ap-southeast-2.compute.internal: instance not found, requeuing
node_controller.go:425] Initializing node ip-10-117-161-37.ap-southeast-2.compute.internal with cloud provider
node_controller.go:229] error syncing 'ip-10-117-161-37.ap-southeast-2.compute.internal': failed to get instance metadata for node ip-10-117-161-37.ap-southeast-2.compute.internal: instance not found, requeuing

Configuration

      containers:
        - args:
            - '--v=2'
            - '--cloud-provider=aws'
            - '--configure-cloud-routes=false'
          image: registry.k8s.io/provider-aws/cloud-controller-manager:v1.30.1

/kind feature

k8s-ci-robot commented 1 month ago

This issue is currently awaiting triage.

If cloud-provider-aws contributors determine this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.
atsai1220 commented 1 month ago

I can see from CloudTrail that it was looking for an instance in the wrong region and receiving an error

{
    "eventVersion": "1.09",
    "userIdentity": {
        "type": "AssumedRole",
        "principalId": ":i-07143150147441be0",
        "arn": "arn:aws:sts:::assumed-role//i-07143150147441be0",
        "accountId": "",
        "accessKeyId": "",
        "sessionContext": {
            "sessionIssuer": {
                "type": "Role",
                "principalId": "",
                "arn": "arn:aws:iam:::role/",
                "accountId": "982008609023",
                "userName": ""
            },
            "attributes": {
                "creationDate": "2024-06-03T17:16:33Z",
                "mfaAuthenticated": "false"
            },
            "ec2RoleDelivery": "2.0"
        }
    },
    "eventTime": "2024-06-03T21:58:18Z",
    "eventSource": "ec2.amazonaws.com",
    "eventName": "DescribeInstances",
    "awsRegion": "us-west-2",
    "sourceIPAddress": "",
    "userAgent": "kubernetes/v1.26.13 aws-sdk-go/1.44.116 (go1.20.13; linux; amd64)",
    "errorCode": "Client.InvalidInstanceID.NotFound",
    "errorMessage": "The instance ID 'i-03c4ae677928450eb' does not exist",
    "requestParameters": {
        "instancesSet": {
            "items": [
                {
                    "instanceId": "i-03c4ae677928450eb"
                }
            ]
        },
        "filterSet": {}
    },
    "responseElements": null,
    "requestID": "474a349c-7107-4df1-9e5e-489a1bc9b606",
    "eventID": "b7b55185-9287-42a7-bad1-6fa628fa05a6",
    "readOnly": true,
    "eventType": "AwsApiCall",
    "managementEvent": true,
    "recipientAccountId": "982008609023",
    "eventCategory": "Management",
    "tlsDetails": {
        "tlsVersion": "TLSv1.3",
        "cipherSuite": "TLS_AES_128_GCM_SHA256",
        "clientProvidedHostHeader": "ec2.us-west-2.amazonaws.com"
    }
}
kmala commented 1 month ago

you can set the region using the cloud config https://github.com/kubernetes/cloud-provider-aws/blob/master/pkg/providers/v1/config/config.go#L25 . Can you try using that ?

cartermckinnon commented 1 month ago

Setting the region in the config won't allow you to handle instances in multiple regions, but it will allow you to e.g. run the AWS CCM on-prem or in another region.

The AWS CCM assumes that your resources are in a single region in many places. I'm not necessarily opposed to changing this in the future, but it will require changes far beyond the DescribeInstances calls.