kubecost / features-bugs

A public repository for filing of Kubecost feature requests and bugs. Please read the issue guidelines before filing an issue here.
0 stars 0 forks source link

[Bug] Spot Data Feed Access: AWS ListObjects, https response error StatusCode: 403, api error AccessDenied: Access Denied #54

Open gomesdigital opened 7 months ago

gomesdigital commented 7 months ago

Kubecost Version

1.108.0

Kubernetes Version

v1.27.9

Kubernetes Platform

EKS

Description

Hello,

We've followed the docs to establish a spot data feed integration, but are facing a 403 error as displayed in on the /diagnostics page: image

For context our setup is as follows:

Helm config:

    kubecostProductConfigs:
      athenaBucketName: s3://redacted
      athenaDatabase: athenacurcfn_kubecost
      athenaProjectID: 'Account A ID'
      athenaRegion: eu-central-1
      athenaTable: kubecost
      athenaWorkgroup: Kubecost
      awsSpotDataBucket: redacted (no s3 prefix)
      awsSpotDataPrefix: '' (there is no prefix configured for the spot data feed)
      awsSpotDataRegion: eu-central-1
      clusterName: redacted
      masterPayerARN: 'redacted' (role in account A)
      projectID: 'account B' (where the cluster lives)
    kubecostToken: 'redacted'
    prometheus:
      nodeExporter:
        enabled: false
      serviceAccounts:
        nodeExporter:
          create: false
      server:
        resources:
          limits:
            memory: 4096Mi
          requests:
            cpu: 500m
            memory: 2048Mi
    serviceAccount:
      create: false
      name: kubecost

My understanding is that the spot data feed feature will be assuming the same role used by the CUR integration, as specified with the masterPayerARN property.

We have tested the relevant bucket and IAM policies and can confirm the role of the service account is able to use the ListObjects API on the spot data feed bucket via the AWS CLI.

Steps to reproduce

N/A - we don't know why this is happening. Configuration is outlined above.

Expected behavior

Perform the ListObjects API call and return a 200 status code as the masterPayerARN role has permission to do so.

Impact

High. Majority of our workloads run on spot nodes. Without this feature we will need to wait for the CUR reconciliation which is not a practical time frame for our business needs.

Screenshots

No response

Logs

WRN Skipping AWS spot data download: operation error S3: ListObjects, https response error StatusCode: 403, RequestID: 10XE9D9CFAYGT01P, HostID: redacted, api error AccessDenied: Access Denied

WRN got error 9 error(s) retrieving volumes: [operation error EC2: DescribeVolumes, failed to sign request: failed to retrieve credentials: failed to refresh cached credentials, failed to retrieve credentials, operation error STS: AssumeRoleWithWebIdentity, exceeded maximum number of attempts, 3, https response error StatusCode: 400, RequestID: 10443fba-78f6-49f3-bf28-a599bbd44b04, InvalidIdentityToken: No OpenIDConnect provider found in your account for https://oidc.eks.eu-central-1.amazonaws.com/id/redacted

ERR savings: cluster sizing: failed to get monthly cluster rates: error getting valid asset set in MonthlyNodeClusterRates: could not obtain latest valid asset set (an AssetSet where all Assets (i.e. Nodes) have NodeType != "" and TotalCost > 0.0

ERR error creating spot-ready workload distributions: error fetching monthly cluster rates: error getting valid asset set in MonthlyNodeClusterRates: could not obtain latest valid asset set (an AssetSet where all Assets (i.e. Nodes) have NodeType != "" and TotalCost > 0.0

Slack discussion

No response

Troubleshooting

haooliveira84 commented 1 month ago

Same here