DescribeAddresses and DescribeVolumes fails with valid IRSA config

jessegoodier commented 1 year ago

When using IRSA, Kubecost cannot access aws ec2 resources and logs the following messages even when the service account has the correct policy.

I back tested this with 1.101 and 1.102 and all versions have the issue.

error message:

WRN unable to get addresses: operation error EC2: DescribeAddresses, failed to sign request: failed to retrieve credentials: failed to refresh cached credentials, failed to retrieve credentials, operation error STS: AssumeRoleWithWebIdentity, exceeded maximum number of attempts, 3, https response error StatusCode: 400, RequestID: c63cf5bd-27d3-4919-8251-08fcf7ce7151, InvalidIdentityToken: No OpenIDConnect provider found in your account for https://oidc.eks.ca-central-1.amazonaws.com/id/2086E4D4C3BEAFFF61F3617142CA5DCC

WRN unable to get disks: operation error EC2: DescribeVolumes, failed to sign request: failed to retrieve credentials: failed to refresh cached credentials, failed to retrieve credentials, operation error STS: AssumeRoleWithWebIdentity, exceeded maximum number of attempts, 3, https response error StatusCode: 400, RequestID: 97337482-f0e2-489d-b8e6-c9108a264d8e, InvalidIdentityToken: No OpenIDConnect provider found in your account for https://oidc.eks.ca-central-1.amazonaws.com/id/2086E4D4C3BEAFFF61F3617142CA5DCC

To Reproduce
Steps to reproduce the behavior:

create a policy:

{
"Version": "2012-10-17",
"Statement": [
    {
        "Sid": "KubecostSavingsAccess",
        "Effect": "Allow",
        "Action": [
            "ec2:DescribeAddresses",
            "ec2:DescribeVolumes"
        ],
        "Resource": "*"
    }
]
}

create an IRSA account with the policy:

>eksctl create iamserviceaccount \
    --name kubecost-serviceaccount \
    --namespace kubecost \
    --cluster jesse-temp --region ca-central-1 \
    --attach-policy-arn arn:aws:iam::297945954695:policy/jesse-temp-savings-policy \
    --override-existing-serviceaccounts \
    --approve

install kubecost and view logs

helm install kubecost kubecost/cost-analyzer --version 1.104.1 \
  --set serviceAccount.create=false --set serviceAccount.name=kubecost-serviceaccount

Expected behavior
no errors

What impact will this have on your ability to get value out of Kubecost? savings reports broken for /orphaned-resources

jessegoodier commented 1 year ago

note: here's a pod you can use to test:

apiVersion: v1
kind: Pod
metadata:
  labels:
    run: awscli
  name: awscli
spec:
  serviceAccountName: kubecost-serviceaccount
  containers:
  - image: amazon/aws-cli
    name: awscli
    command: ['sleep', '9999999']

k exec -it s3pod -- aws ec2 describe-volumes

jessegoodier commented 1 year ago

More testing: spot feed works with the same cluster. So IRSA itself is working.

srpomeroy commented 1 year ago

Not directly related but important to reference. https://github.com/kubecost/cost-analyzer-helm-chart/issues/2167

srpomeroy commented 1 year ago

@jcharcalla Called out that it may be related to disabled regions. Similar error state to SCP disallowed regions.

thomasvn commented 1 year ago

@jessegoodier @srpomeroy Thanks for reproducing! I've logged this as a bug and we'll aim to resolve soon.

pwouavre commented 1 year ago

Hello,

I have the exact same issue here using kubecost version 1.104.4 ;) Have a nice day !

danjmccay commented 1 year ago

I'm seeing the same issue. In IAM, I can see the linked role is accessing ap-southeast-1 when all of our resources are in eu-west-1.

srpomeroy commented 1 year ago

I'm seeing the same issue. In IAM, I can see the linked role is accessing ap-southeast-1 when all of our resources are in eu-west-1.

That is normal. Kubecost currently looks for resources in all provider regions in order to populate the orphaned resources report.

andrewhharmon commented 1 year ago

I'm seeing this issue as well. No workaround at the moment?

jessegoodier commented 1 year ago

No workaround yet. There are a few variables, we are looking into this and well keep you updated when a fix is ready.

cyphrsonic commented 1 year ago

Are there any updates on a workaround to this issue? I'm working on doing an initial deployment via helm, and these errors are making it harder to spot the actual issues with my configuration.

AjayTripathy commented 1 year ago

I would propose we simply remove these logs or drop their warn level in an upcoming release. cc @cliffcolvin for additional triage

danielrolfes2307 commented 10 months ago

Hi, any new developments and/or workarounds?

jessegoodier commented 10 months ago

Hi, any new developments and/or workarounds?

We do need to update our documentation, thanks for the reminder.

This is resolved as of Kubecost 1.106. Are you using IRSA? Just be sure it has a policy that allows:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "DescribeCloudResources",
            "Effect": "Allow",
            "Action": [
                "ec2:DescribeAddresses",
                "ec2:DescribeVolumes"
            ],
            "Resource": "*"
        }
    ]
}

danielrolfes2307 commented 10 months ago

@jessegoodier I'm running Helm chart: 1.106.4 and still see the issue (?) "You are not authorized to perform this operation. User: xxxx: assumed-role/kubecost-iam-role-20231103140434006600000002/1699343400001785339 is not authorized to perform: ec2:DescribeVolumes with an explicit deny in a service control policy"

It still seems to use these actions on "inactive" regions. I dont see the issue in the active regions.

jessegoodier commented 10 months ago

@jessegoodier I'm running Helm chart: 1.106.4 and still see the issue (?) "You are not authorized to perform this operation. User: xxxx: assumed-role/kubecost-iam-role-20231103140434006600000002/1699343400001785339 is not authorized to perform: ec2:DescribeVolumes with an explicit deny in a service control policy"

It still seems to use these actions on "inactive" regions. I dont see the issue in the active regions.

Okay, let me see where we are on this.

srpomeroy commented 10 months ago

I don't believe AWS provides an API for determining available regions. Kubecost takes the brute force option of querying all regions and using the HTTP response codes to determine if we can query the region for resources.

A potential enhancement could be to provide a list of regions to query or avoid. Or something like an adaptive backoff process that will scale back how often a region is queried as long as it's erroring out.

danielrolfes2307 commented 10 months ago

@jessegoodier Do you mind reopening the issue again? Thanks :)

chipzoller commented 10 months ago

As it seems like this is a Kubecost application issue and not something related to the Helm chart, I'm transferring to the appropriate repository.

boarder7395 commented 9 months ago

To add another relevant point for this issue, ideally the solution would restrict outbound requests only to relevant regions. I work for a US based Healthcare company and having outbound request to non-us based regions is a major red flag for our security teams. Right now kubecost-cost-analyzer makes an outbound request to all regions (all aws supported countries) which triggers alarms on our firewalls. It is currently blocked by the firewall but is extremely concerning.

ameusel commented 9 months ago

Just installed v1.108.0 today and still seeing this behavior, access advisor shows the role as trying to describe EC2 resources in Tokyo (all our kits in London)

philgladman commented 9 months ago

@boarder7395 We are running into the same issue, currently using v1.106.1. We only have resources deployed in one region, us-gov-west-1, and kubecost is throwing logs and errors about us-gov-east-1, which we have zero resources deployed in. It would be very helpful to tell kubecost which region(s) to use/query.

To add another relevant point for this issue, ideally the solution would restrict outbound requests only to relevant regions. I work for a US based Healthcare company and having outbound request to non-us based regions is a major red flag for our security teams. Right now kubecost-cost-analyzer makes an outbound request to all regions (all aws supported countries) which triggers alarms on our firewalls. It is currently blocked by the firewall but is extremely concerning.

korjek commented 6 months ago

same issue here with the latest v2.1.0

arruko commented 5 months ago

Same issue with helm chart 2.2.0. It is really concerning in terms of false positive alerts due to many error messages and outbound requests to non-used regions.

Any hints what's needed to change at the code level? Willing to collaborate.

tkaesserfm commented 4 months ago

An option to supply a simple list of regions would help us a lot here.

jessegoodier commented 4 months ago

I will escalate internally, thanks for the ping. issue BURNDOWN-155

fheinecke commented 2 months ago

I don't believe AWS provides an API for determining available regions

aws account list-regions --region-opt-status-contains ENABLED ENABLING ENABLED_BY_DEFAULT DISABLING

kubecost / features-bugs

DescribeAddresses and DescribeVolumes fails with valid IRSA config #20