Open jessegoodier opened 1 year ago
note: here's a pod you can use to test:
apiVersion: v1
kind: Pod
metadata:
labels:
run: awscli
name: awscli
spec:
serviceAccountName: kubecost-serviceaccount
containers:
- image: amazon/aws-cli
name: awscli
command: ['sleep', '9999999']
k exec -it s3pod -- aws ec2 describe-volumes
More testing: spot feed works with the same cluster. So IRSA itself is working.
Not directly related but important to reference. https://github.com/kubecost/cost-analyzer-helm-chart/issues/2167
@jcharcalla Called out that it may be related to disabled regions. Similar error state to SCP disallowed regions.
@jessegoodier @srpomeroy Thanks for reproducing! I've logged this as a bug and we'll aim to resolve soon.
Hello,
I have the exact same issue here using kubecost version 1.104.4 ;) Have a nice day !
I'm seeing the same issue. In IAM, I can see the linked role is accessing ap-southeast-1 when all of our resources are in eu-west-1.
I'm seeing the same issue. In IAM, I can see the linked role is accessing ap-southeast-1 when all of our resources are in eu-west-1.
That is normal. Kubecost currently looks for resources in all provider regions in order to populate the orphaned resources report.
I'm seeing this issue as well. No workaround at the moment?
No workaround yet. There are a few variables, we are looking into this and well keep you updated when a fix is ready.
Are there any updates on a workaround to this issue? I'm working on doing an initial deployment via helm, and these errors are making it harder to spot the actual issues with my configuration.
I would propose we simply remove these logs or drop their warn level in an upcoming release. cc @cliffcolvin for additional triage
Hi, any new developments and/or workarounds?
Hi, any new developments and/or workarounds?
We do need to update our documentation, thanks for the reminder.
This is resolved as of Kubecost 1.106. Are you using IRSA? Just be sure it has a policy that allows:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "DescribeCloudResources",
"Effect": "Allow",
"Action": [
"ec2:DescribeAddresses",
"ec2:DescribeVolumes"
],
"Resource": "*"
}
]
}
@jessegoodier I'm running Helm chart: 1.106.4 and still see the issue (?) "You are not authorized to perform this operation. User: xxxx: assumed-role/kubecost-iam-role-20231103140434006600000002/1699343400001785339 is not authorized to perform: ec2:DescribeVolumes with an explicit deny in a service control policy"
It still seems to use these actions on "inactive" regions. I dont see the issue in the active regions.
@jessegoodier I'm running Helm chart: 1.106.4 and still see the issue (?) "You are not authorized to perform this operation. User: xxxx: assumed-role/kubecost-iam-role-20231103140434006600000002/1699343400001785339 is not authorized to perform: ec2:DescribeVolumes with an explicit deny in a service control policy"
It still seems to use these actions on "inactive" regions. I dont see the issue in the active regions.
Okay, let me see where we are on this.
I don't believe AWS provides an API for determining available regions. Kubecost takes the brute force option of querying all regions and using the HTTP response codes to determine if we can query the region for resources.
A potential enhancement could be to provide a list of regions to query or avoid. Or something like an adaptive backoff process that will scale back how often a region is queried as long as it's erroring out.
@jessegoodier Do you mind reopening the issue again? Thanks :)
As it seems like this is a Kubecost application issue and not something related to the Helm chart, I'm transferring to the appropriate repository.
To add another relevant point for this issue, ideally the solution would restrict outbound requests only to relevant regions. I work for a US based Healthcare company and having outbound request to non-us based regions is a major red flag for our security teams. Right now kubecost-cost-analyzer makes an outbound request to all regions (all aws supported countries) which triggers alarms on our firewalls. It is currently blocked by the firewall but is extremely concerning.
Just installed v1.108.0 today and still seeing this behavior, access advisor shows the role as trying to describe EC2 resources in Tokyo (all our kits in London)
@boarder7395 We are running into the same issue, currently using v1.106.1. We only have resources deployed in one region, us-gov-west-1, and kubecost is throwing logs and errors about us-gov-east-1, which we have zero resources deployed in. It would be very helpful to tell kubecost which region(s) to use/query.
To add another relevant point for this issue, ideally the solution would restrict outbound requests only to relevant regions. I work for a US based Healthcare company and having outbound request to non-us based regions is a major red flag for our security teams. Right now kubecost-cost-analyzer makes an outbound request to all regions (all aws supported countries) which triggers alarms on our firewalls. It is currently blocked by the firewall but is extremely concerning.
same issue here with the latest v2.1.0
Same issue with helm chart 2.2.0. It is really concerning in terms of false positive alerts due to many error messages and outbound requests to non-used regions.
Any hints what's needed to change at the code level? Willing to collaborate.
An option to supply a simple list of regions would help us a lot here.
I will escalate internally, thanks for the ping. issue BURNDOWN-155
I don't believe AWS provides an API for determining available regions
aws account list-regions --region-opt-status-contains ENABLED ENABLING ENABLED_BY_DEFAULT DISABLING
When using IRSA, Kubecost cannot access aws ec2 resources and logs the following messages even when the service account has the correct policy.
I back tested this with 1.101 and 1.102 and all versions have the issue.
error message:
WRN unable to get addresses: operation error EC2: DescribeAddresses, failed to sign request: failed to retrieve credentials: failed to refresh cached credentials, failed to retrieve credentials, operation error STS: AssumeRoleWithWebIdentity, exceeded maximum number of attempts, 3, https response error StatusCode: 400, RequestID: c63cf5bd-27d3-4919-8251-08fcf7ce7151, InvalidIdentityToken: No OpenIDConnect provider found in your account for https://oidc.eks.ca-central-1.amazonaws.com/id/2086E4D4C3BEAFFF61F3617142CA5DCC
WRN unable to get disks: operation error EC2: DescribeVolumes, failed to sign request: failed to retrieve credentials: failed to refresh cached credentials, failed to retrieve credentials, operation error STS: AssumeRoleWithWebIdentity, exceeded maximum number of attempts, 3, https response error StatusCode: 400, RequestID: 97337482-f0e2-489d-b8e6-c9108a264d8e, InvalidIdentityToken: No OpenIDConnect provider found in your account for https://oidc.eks.ca-central-1.amazonaws.com/id/2086E4D4C3BEAFFF61F3617142CA5DCC
To Reproduce
Steps to reproduce the behavior:
create an IRSA account with the policy:
install kubecost and view logs
Expected behavior
no errors
What impact will this have on your ability to get value out of Kubecost? savings reports broken for /orphaned-resources