EKS - withOIDC - instead of AWS access key - secret key

avaussant commented 4 years ago

hi team i m interested per your stuff but on my side we don't use harcoded credentials or technical accounts with creds. We prefer to use service account linked with an dedicated policy on AWS.

I have tried to use cluster-turndown but without credentials it is refusing to execute the scale down After a quick review on the go code, i see the AWS creds required.

Have you a solution to skip this issue for this use case with AWS Provider ?

Thanks a lot in advance Alexandre

mbolt35 commented 4 years ago

@avaussant Thanks for the report! I'm going to take a look at this as soon as possible. I believe we're currently looking at loading a service key directly from the secret. I think yielding to another AWS auth mechanism won't be a problem, but I'll investigate further and let you know.

avaussant commented 4 years ago

@avaussant Thanks for the report! I'm going to take a look at this as soon as possible. I believe we're currently looking at loading a service key directly from the secret. I think yielding to another AWS auth mechanism won't be a problem, but I'll investigate further and let you know.

Thanks for the answer, i appreciate 👍

mbolt35 commented 4 years ago

@avaussant I've just added a PR for changes related to authentication here: https://github.com/kubecost/cluster-turndown/pull/29

Also, I've cut a new image snapshot here: gcr.io/kubecost1/cluster-turndown:1.3-SNAPSHOT

If you want to try and update the yaml descriptor and see if this moves you forward any, please feel free :)

One quick note that you should also be able to remove the secret volumeMount and volume from the yaml as well. Alternatively, you can just add a secret with an empty file. Failing to load or parse the service-key.json will still continue with a "chained auth" (which I believe works with the dedicated policy).

Let me know if you have any more problems with this. Thanks again for the feedback!

avaussant commented 4 years ago

@mbolt35 Thanks a lot I will test the snapshot today and give you an update 👍👍👍👍👍👍

avaussant commented 4 years ago

after my test, there is an error @mbolt35

the sa used

after my test, there is an error @mbolt35 I0619 11:16:17.383715 1 main.go:118] Running Kubecost Turndown on: ip-XX-XX-XX-XX.ec2.internal I0619 11:16:17.404401 1 clusterprovider.go:95] Found ProviderID starting with "aws", using AWS Provider I0619 11:16:17.409508 1 awsclusterprovider.go:1035] [Warning] Failed to load valid access key from secret. Err=Failed to locate service account file: /var/keys/service-key.json I0619 11:16:17.414705 1 provider.go:63] Found ProviderID starting with "aws", using AWS Provider I0619 11:16:17.414721 1 validator.go:41] Validating Provider... I0619 11:16:23.669140 1 validator.go:27] [Error]: Failed to load node groups: NoCredentialProviders: no valid providers in chain caused by: EnvAccessKeyNotFound: AWS_ACCESS_KEY_ID or AWS_ACCESS_KEY not found in environment SharedCredsLoad: failed to load shared credentials file caused by: FailedRead: unable to open file caused by: open /root/.aws/credentials: no such file or directory EC2RoleRequestError: no EC2 instance role found caused by: RequestError: send request failed caused by: Get "http://169.254.169.254/latest/meta-data/iam/security-credentials/": context deadline exceeded (Client.Timeout exceeded while awaiting headers) I0619 11:16:23.669157 1 validator.go:31] Retrying (4 remaining) in 10 seconds... I0619 11:16:36.837234 1 validator.go:27] [Error]: Failed to load node groups: NoCredentialProviders: no valid providers in chain caused by: EnvAccessKeyNotFound: AWS_ACCESS_KEY_ID or AWS_ACCESS_KEY not found in environment SharedCredsLoad: failed to load shared credentials file caused by: FailedRead: unable to open file caused by: open /root/.aws/credentials: no such file or directory EC2RoleRequestError: no EC2 instance role found caused by: RequestError: send request failed caused by: Get "http://169.254.169.254/latest/meta-data/iam/security-credentials/": context deadline exceeded (Client.Timeout exceeded while awaiting headers) I0619 11:16:36.837249 1 validator.go:31] Retrying (3 remaining) in 10 seconds... I0619 11:16:49.953491 1 validator.go:27] [Error]: Failed to load node groups: NoCredentialProviders: no valid providers in chain

My deployment apiVersion: apps/v1 kind: Deployment metadata: name: cluster-turndown namespace: turndown spec: strategy: rollingUpdate: maxSurge: 1 maxUnavailable: 1 type: RollingUpdate selector: matchLabels: app: cluster-turndown template: metadata: namespace: turndown labels: app: cluster-turndown spec: containers:

name: cluster-turndown image: gcr.io/kubecost1/cluster-turndown:1.3-SNAPSHOT env:
- name: NODE_NAME valueFrom: fieldRef: fieldPath: spec.nodeName
- name: TURNDOWN_NAMESPACE valueFrom: fieldRef: fieldPath: metadata.namespace
- name: TURNDOWN_DEPLOYMENT value: cluster-turndown serviceAccount: cluster-turndown serviceAccountName: cluster-turndown

and the service account

apiVersion: v1 kind: ServiceAccount metadata: annotations: eks.amazonaws.com/role-arn: arn:aws:iam::XXXXXXXXXXXXXX:role/eksctl-ava-test-sample-project-addon-iamserv-Role1-1XGZ3NLDQHXJM creationTimestamp: "2020-06-19T09:45:20Z" labels: aws-usage: cluster-ops name: cluster-turndown namespace: turndown resourceVersion: "165919" selfLink: /api/v1/namespaces/turndown/serviceaccounts/cluster-turndown uid: e8c3ba97-4329-407b-addd-92b19da7f4b8 secrets:

name: cluster-turndown-token-c52g6

i have respected the policy use per this service account (the policy provided on the README.md)

avaussant commented 4 years ago

We don't use the metadata path directly but the sdk for each custom apps: it's Forbidden for us

I will retry with a standard cluster without customization for the nodegroup

with a basic eks cluster

command eksctl create cluster --name=dev-ava --region=eu-central-1 --zones=eu-central-1a,eu-central-1b,eu-central-1c --nodes-min=1 --nodes-max=2 --managed
out I0619 12:43:39.029269 1 main.go:118] Running Kubecost Turndown on: ip-XX-XX-XX-XX.ec2.internal.eu-central-1.compute.internal I0619 12:43:39.045194 1 clusterprovider.go:92] Found ProviderID starting with "aws" and eks nodegroup, using EKS Provider I0619 12:43:39.049583 1 eksclusterprovider.go:494] [Warning] Failed to load valid access key from secret. Err=Failed to locate service account file: /var/keys/service-key.json I0619 12:43:42.236478 1 main.go:133] [Error]: Failed to create ClusterProvider: AccessDeniedException: status code: 403, request id: ca977824-3986-41d8-87ba-a15142c2e172

mbolt35 commented 4 years ago

Hey @avaussant just wanted to let you know I'm going to look into this ASAP. Apologies for the delay!

mbolt35 commented 4 years ago

@avaussant On another ticket, we realized that the permissions we have documented for AWS require some additional EKS permissions if you're using EKS. Specifically, the following:

{
    "Effect": "Allow",
    "Action": [
        "eks:ListClusters",
        "eks:DescribeCluster",
        "eks:DescribeNodegroup",
        "eks:ListNodegroups",
        "eks:CreateNodegroup",
        "eks:UpdateClusterConfig",
        "eks:UpdateNodegroupConfig",
        "eks:DeleteNodegroup",
        "eks:ListTagsForResource",
        "eks:TagResource",
        "eks:UntagResource"
    ],
    "Resource": "*"
}

These would be in addition to the AutoScalingFullAccess permissions from our documentation. Just to provide further information why this is necessary: We use the EKS API for most of the implementation; however, the EKS API does not allow us to resize a node group to size: 0. In order to properly resize a node group to 0, we locate the backing AutoScalingGroup and resize that to 0 (the EKS nodegroup will eventually synchronize). So unfortunately you need the AutoScalingFullAccess permission defined in our documentation as well as the above permissions.

Does this help move the needle at all?

avaussant commented 4 years ago

Hey @avaussant just wanted to let you know I'm going to look into this ASAP. Apologies for the delay!

No worries i need to switch on redis topics for 2 weeks take the time ;)

mbolt35 commented 3 years ago

Should be resolved in latest release (v1.2.2)

angelwcrypto commented 2 years ago

Would like to know if there is update with the OIDC authentication? Thank you.

dwbrown2 commented 2 years ago

@angelwcrypto it's actively under development and currently expected in our next release!

@AdamStack18 can you please link a tracking bug?

michaelmdresser commented 2 years ago

Would like to know if there is update with the OIDC authentication? Thank you.

@angelwcrypto is this a question about OIDC authentication support in cluster-turndown? Or in Kubecost more generally?

angelwcrypto commented 2 years ago

@michaelmdresser It is OIDC authentication support in cluster-turndown, thanks

angelwcrypto commented 2 years ago

I have created another issue as the AWS access id and access key seemed not working properly as well. https://github.com/kubecost/cluster-turndown/issues/51

Adam-Stack-PM commented 2 years ago

@kaelanspatel, Getting this on your radar related to your OIDC work in the upcoming release (v1.95). https://github.com/kubecost/cost-analyzer-helm-chart/issues/1371

kubecost / cluster-turndown

EKS - withOIDC - instead of AWS access key - secret key #27