pulumi / pulumi-eks

A Pulumi component for easily creating and managing an Amazon EKS Cluster
https://www.pulumi.com/registry/packages/eks/
Apache License 2.0
169 stars 78 forks source link

Creating EKS cluster failing when providing credentials via pulumi config secrets instead of environment #669

Open robotlovesyou opened 2 years ago

robotlovesyou commented 2 years ago

Hello!

Issue details

I'm writing a utility to create infrastructure for us via the pulumi automation api. I'm also using the AWS STS SDK to perform the assume role command to acquire AWS. credentials. I have a stack which creates a simple EKS cluster for our CI runners. When providing AWS credentials via aws:accessKey, aws:secretKey, aws:token, the creation of the EKS cluster fails at quite a late stage due to being unable to communicate with the EKS Cluster API. Note that a lot of the related AWS objects, including the AWS cluster itself are successfully created, so for most of the process, the provided credentials are being used.

Because I can successfully update the stack when run manually (via pulumi up) with credentials provided as environment variables, I tried altering my automation code to programatically set environment variables rather than configuration secrets for the stack, and it started completing.

My hunch is that something to do with our configuration causes pulumi eks to use the k8s API as well as the AWS API and that there is an issue with that part of the process which causes the credentials to not be collected.

Steps to reproduce

Running the code below with credentials provided via pulumi config will fail. Providing the same credentials via environment variables will succeed.

/**
 * Creates necessary dependencies and then sets up an eks cluster for running ci jobs
 */

import * as pulumi from "@pulumi/pulumi"
import * as eks from "@pulumi/eks";
import * as awsx from "@pulumi/awsx";

 /**
  * Create a VPC with the given name
  * @param name the name of the vpc
  * @returns 
  */
 const createVPC = (name: string):awsx.ec2.Vpc => {
     const vpc = new awsx.ec2.Vpc(name, {});
     return vpc;
 }

 /**
  * Set the name and vpc to use for an eks cluster
  */
 interface ClusterOptions {
     name: string
     vpc: awsx.ec2.Vpc
 }

 /**
  * Create an EKS cluster with the provided options set
  * @param opts options for the eks cluster
  * @returns 
  */

 const createCluster = (opts: ClusterOptions): eks.Cluster => new eks.Cluster(opts.name, {
     vpcId: opts.vpc.id,
     publicSubnetIds: opts.vpc.publicSubnetIds,
     privateSubnetIds: opts.vpc.privateSubnetIds,
     nodeAssociatePublicIpAddress: false,
     nodeGroupOptions: {
         desiredCapacity: 3,
         minSize: 2,
         maxSize: 5,
         instanceType: "t3.xlarge",
         nodeRootVolumeSize: 100,
     },
     version: "1.21",
     useDefaultVpcCni: true,
     enabledClusterLogTypes: ["api", "audit", "controllerManager", "scheduler"],
     createOidcProvider: true,
 });

 const stackConfig = new pulumi.Config()
 const awsConfig = new pulumi.Config("aws")

 const baseName = `${stackConfig.require("subaccount")}-${awsConfig.require("region")}-ci-cluster`

 const vpc = createVPC(`${baseName}-vpc`);
 const cluster = createCluster({name: `${baseName}-eks`, vpc});

 export const eksClusterName = cluster.eksCluster.id;
 export const eksKubeconfig = cluster.kubeconfig;
 export const oidcProviderArn = cluster.core.oidcProvider?.arn
 export const oidcProviderUrl = cluster.core.oidcProvider?.url

Expected: The update succeeds Actual: The update fails

jkodroff commented 2 years ago

@robotlovesyou Thanks for submitting this issue. A few questions:

  1. Would it be possible to provide us with the actual error output?
  2. Does using pulumi up with the variables in the stack config work? (This will help us determine whether this is an issue with the automation API or not.)
robotlovesyou commented 2 years ago

@jkodroff

Would it be possible to provide us with the actual error output?

I am not sure when I will get a chance to break this again in order to provide you with the error output, especially since I appear to have found at least one new way to place a stack into a state where nothing works except manually deleting everything, which is not good, and has eaten a lot of my time in the last day or so.

Does using pulumi up with the variables in the stack config work? (This will help us determine whether this is an issue with the automation API or not.)

It appeared that both the automation approach and pulumi up were failing but there could be external factors which caused that (such as the temporary credentials having already expired) and I did not control for that before I found the workaround.

liamawhite commented 2 years ago

I think I'm seeing the same thing, my errors are:

kubernetes:core/v1:ConfigMap hosted-cp-workshop-nodeAccess creating error: configured Kubernetes cluster is unreachable: unable to load schema information from the API server: the server has asked for the client to provide credentials

and

eks:index:VpcCni hosted-cp-workshop-vpc-cni creating error: Command failed: kubectl apply -f /var/folders/9r/by9bv60j729_wcsxd86fwv480000gn/T/tmp-11470MBwhtB8hf7cn.tmp

Note that we cheat and manually configure an aws profile using the config based credentials so that the token retrieval kubeconfig these things use is able to retrieve it. It does appear that the token retrieval works but the token is unauthorized.

robotlovesyou commented 2 years ago

@liamawhite @jkodroff From memory, these are the same as the errors I got.

saborrie commented 1 year ago

I have also had this exact problem. I'm not using the automation api - is there any way to set the environment variables in a normal pulumi run from pulumi up so that I can use the workaround?

juanfbl9307 commented 1 year ago

any update? I have the same issue trying to create a eks.Cluster with fargate

esomore commented 1 year ago

also seeing this when running from github actions, locally it's working fine

    error: could not get server version from Kubernetes: the server has asked for the client to provide credentials
mchristen commented 1 year ago

Is there any update on this?

jamesloosli commented 1 year ago

Same issue is biting me as well.

I'm running into this running pulumi up from the context of a github action, which has two aws profiles; default and cosm-sandbox.

My pulumi.yaml for this stack has aws:profile set to cosm-sandbox, and while the cluster is created, and I've verified that the cosm-sandbox profile can access it, pulumi up fails with the following;

  cosm:eks$eks:index:Cluster$eks:index:VpcCni (sandboxblue-us-west-2-vpc-cni)
    error: Command failed: kubectl apply -f /tmp/tmp-911MsliU5Df1Aln.tmp
error: You must be logged in to the server (the server has asked for the client to provide credentials)

Assuming the role on the command line locally, I can interact with the cluster just fine;

❯ aws --profile cosm-sandbox-pulumi eks update-kubeconfig --name sandboxblue-us-west-2 --profile cosm-sandbox-pulumi
Updated context arn:aws:eks:us-west-2:743399912270:cluster/sandboxblue-us-west-2 in /Users/jloosli/.kube/config
❯ k get ns
NAME              STATUS   AGE
default           Active   4m37s
kube-node-lease   Active   4m39s
kube-public       Active   4m39s
kube-system       Active   4m39s
❯ k get cm -n kube-system
NAME                                 DATA   AGE
coredns                              1      4m51s
cp-vpc-resource-controller           0      4m46s
eks-certificates-controller          0      4m44s
extension-apiserver-authentication   6      4m57s
kube-proxy                           1      4m51s
kube-proxy-config                    1      4m51s
kube-root-ca.crt                     1      4m47s

Notably, the aws-auth configmap is missing.

I've tested this behavior on eks version 0.41.2 and 1.0.1 with similar results.

jamesloosli commented 1 year ago

So it seems the resolution was to switch the providerConfigOpts from using the named profile to a roleArn.

    // Create cluster
    this.cluster = new eks.Cluster(
      clusterName,
      {
        name: clusterName,
        vpcId: args.vpcId,
        subnetIds: args.subnetIds,
        skipDefaultNodeGroup: true,
        providerCredentialOpts: {
          // profileName: args.context.callerProfile,
          roleArn: `arn:aws:iam::${args.accountId}:role/pulumi-deploy-role`
        },
        enabledClusterLogTypes: [
          'api',
          'audit',
          'authenticator',
          'controllerManager',
          'scheduler'
        ],
        version: '1.24',
        roleMappings: this.roleMappings,
        userMappings: this.userMappings,
        tags: args.context.tags,
        instanceRoles: [this.nodeRole]
      },
      {
        parent: this
      }
    );

In our case, because the deploy role arn is the same in each of our accounts, this should work. I don't like that named profiles don't work though.

KingNoosh commented 1 year ago

I'm facing this issue as I use those AWS env vars to login to the root account and use a bucket on that account as our pulumi backend, all of our stacks refer to an aws:profile and deployment is failing wr/t k8s as a result.

If I run it locally and change the env vars locally it suddenly starts working.

KingNoosh commented 1 year ago

Having removed the AWS Env Vars and using a default profile resolved my problem for me.

amurillo-ncser commented 1 year ago

Having the same problem: I use environment variables set to the credentials for the account where I have the S3 bucket where I store pulumi states hosted. Then I provide specific AWS credentials for the deployment using AWS specific config variables in the stack configuration file: aws:accessKey, aws:region and aws:secretKey This way I can have one pulumi project provisioned to different accounts based on stack configuration settings.

The provisioning is working well until eks:index:VpcCni has to be provisioned when it is raising following error:

eks:index:VpcCni (cluster-vpc-cni):
    error: Command failed: kubectl apply -f /tmp/tmp-42708k01S4dxhHP5U.tmp
    error: You must be logged in to the server (the server has asked for the client to provide credentials)
amurillo-ncser commented 1 year ago

Having removed the AWS Env Vars and using a default profile resolved my problem for me.

But then you have to share the same credentials for S3 backend states and deployment itself, true? My idea was to have all my pulumi states files in a common S3 bucket from a "management" account and then provide specific AWS credentials for the deployment within the stack configuration file.

ghostsquad commented 9 months ago

For me it was enough to do this:

eksCluster, err := eks.NewCluster(ctx, clusterName, &eks.ClusterArgs{
    ProviderCredentialOpts: eks.KubeconfigOptionsArgs{
        ProfileName: pulumi.StringPtr(awsProfile),
    },
KingNoosh commented 6 months ago

Having removed the AWS Env Vars and using a default profile resolved my problem for me.

But then you have to share the same credentials for S3 backend states and deployment itself, true? My idea was to have all my pulumi states files in a common S3 bucket from a "management" account and then provide specific AWS credentials for the deployment within the stack configuration file.

Apologies for responding so late, we still use aws:profile, it's just that the management bucket uses the default aws profile when logged in still using the host machine's AWS creds.

So with this all of the state is managed in the root account still.

ceelian commented 1 month ago

I have similar issue the provider given to pulumi in the pulumi __main__.py is ignored:

# create a Provider with custom credentials
prov = aws.Provider(
    'aws-main-prov',
    region=DEFAULT_AWS_REGION,
    access_key=os.environ['CSD_PULUMI_PROVIDER_MAIN_USER_AWS_KEY'],
    secret_key=pulumi.Output.secret(os.environ['CSD_PULUMI_PROVIDER_MAIN_USER_AWS_SECRET']),
)

# ...

eks.Cluster("eks-main-cluster",
                          vpc_id=eks_vpc.vpc_id,
                          public_subnet_ids=eks_vpc.public_subnet_ids,
                          private_subnet_ids=eks_vpc.private_subnet_ids,
                          desired_capacity=3,
                          min_size=3,
                          max_size=3,
                          node_associate_public_ip_address=False,
                          endpoint_private_access=False,
                          endpoint_public_access=True,
                          opts=pulumi.ResourceOptions(provider=prov) # <--- this is ignored and AWS Env variable are used which are a different account for state bucket
                          )

Here is the call of the pulumi helper script which has different AWS Creds set for the state bucket which is in a different account for security reasons:

#!/usr/bin/env bash
# pulumi wrapper script to setup the right environment/stack
# < -- previous lines skipped  for readability -- >
cd $PRJ_PATH/pulumi/main

# AWS User Credentials for the Main State Backend
export AWS_ACCESS_KEY_ID=$CSD_PULUMI_STATE_MAIN_USER_AWS_KEY
export AWS_SECRET_ACCESS_KEY=$CSD_PULUMI_STATE_MAIN_USER_AWS_SECRET
export AWS_REGION=$CSD_AWS_DEFAULT_REGION

# In case we have secrets provider "passphrase" selected, set the passphrase as env variable
export PULUMI_CONFIG_PASSPHRASE=$CSD_PULUMI_SECRETS_PROVIDER_PASSPHRASE_MAIN

pulumi logout
pulumi login $CSD_PULUMI_STATE_BACKEND_MAIN
pulumi stack select $CSD_PROJECT_NAME.main.$CUR_ENV -c --secrets-provider=$CSD_PULUMI_SECRETS_PROVIDER_MAIN

pulumi $@

The error:


Updating (cnc-infra.main.dev):
     Type                                Name                         Status                  Info
     pulumi:pulumi:Stack                 main-cnc-infra.main.dev      **failed**              1 error; 1 message
     └─ eks:index:Cluster                eks-main-cluster                                     
        ├─ aws:eks:Cluster               eks-main-cluster-eksCluster                          
 +      ├─ eks:index:VpcCni              eks-main-cluster-vpc-cni     **creating failed**     1 error
 +      └─ kubernetes:core/v1:ConfigMap  eks-main-cluster-nodeAccess  **creating failed**     1 error

Diagnostics:
  eks:index:VpcCni (eks-main-cluster-vpc-cni):
    error: Command failed: kubectl apply -f /tmp/tmp-6783PtoTvA0OiVht.tmp
    error: error validating "/tmp/tmp-6783PtoTvA0OiVht.tmp": error validating data: failed to download openapi: the server has asked for the client to provide credentials; if you choose to ignore these errors, turn validation off with --validate=false

  kubernetes:core/v1:ConfigMap (eks-main-cluster-nodeAccess):
    error: configured Kubernetes cluster is unreachable: unable to load schema information from the API server: the server has asked for the client to provide credentials

  pulumi:pulumi:Stack (main-cnc-infra.main.dev):
    error: error validating "/tmp/tmp-6783PtoTvA0OiVht.tmp": error validating data: failed to download openapi: the server has asked for the client to provide credentials; if you choose to ignore these errors, turn validation off with --validate=false

    error: update failed
ceelian commented 1 month ago

I investigated the issue a bit further and found a solution that worked for me. It looks to me that the external tools which are used by pulumi to provision the eks cluster (awscli, kubectl) are not using the provider specified in the code and provided through the opts instead it is overridden by the AWS specific environment variables. Because based on the docs of awscli the environment variable precedes all other options.

So I thought I would have 3 options to solve this issue:

  1. [Not working] unset the AWS creds env variables in the helper bash script right after the pulumi login method. Turns out, that pulumi login for the S3 backend, doesn't create an authenticated session, instead every pulumi command after the login also needs the AWS credentials env variables to be able to access the S3 state files
  2. [Didn't try but should work]Create (by script or by hand) the aws cli profile files and add the S3 state account as one profile and the aws account for the actual infrastructure as another account, then set the providerCredentialOpts: {profileName:...}. This should work, but personally this feels the most like a workaround because I want a single source of config and don't want to generate based on my configuration files aws cli config files. I didn't try it but this should work as mentioned in previous posts.
  3. [what I finally implemented] Give the aws S3 state account the policy to be able to assume an admin role (or whatever permissions you need to create/maintain your infrastructure) in your actual aws infrastructure account. Then set the providerCredentialOpts: {roleArn:...} accordingly. I implemented this approach, as it feels like a good way and doesn't impose any security tradeoffs, as the s3 state account is absolutely limited in only accessing and modifiing the S3 state bucket. In addition it is now allowed to assume a role in the actual infrastructure account with all the permissions needed to have pulumi create infrastructure in the aws infrastructure account. There is no chance that infrastructure is created in the wrong account (s3 state account) by not setting the provider option for a pulumi resource as the s3 state account only has the permission to read/write to its s3 state bucket, nothing else.
account_id = os.getenv("CSD_ACCOUNT_ID_MAIN")
prov = aws.Provider(
    'aws-main-prov',
    region=DEFAULT_AWS_REGION,
    assume_role={
        # a role in the infra account (specified by account_id) with enough permissions to create your infrastructure
        "role_arn": f"arn:aws:iam::{account_id}:role/OrganizationAccountAccessRole",
    },
    # The following skip is required to avoid strange false error about region and arn not being set
    skip_credentials_validation=True,
)

# Create a VPC for the EKS cluster
eks_vpc = awsx.ec2.Vpc("eks-main-vpc",
                       enable_dns_hostnames=True,
                       cidr_block="10.0.0.0/16",
                       number_of_availability_zones=3,
                       nat_gateways=1,
                       subnet_specs=[
                           awsx.ec2.SubnetSpecArgs(
                               type=awsx.ec2.SubnetType.PUBLIC,
                               cidr_mask=24,
                           ),
                           awsx.ec2.SubnetSpecArgs(
                               type=awsx.ec2.SubnetType.PRIVATE,
                               cidr_mask=24,
                           )
                       ],
                       opts=pulumi.ResourceOptions(provider=prov)
                       )

# Create the EKS cluster
eks_cluster = eks.Cluster("eks-main-cluster",
                          # Put the cluster in the new VPC created earlier
                          vpc_id=eks_vpc.vpc_id,
                          # Public subnets will be used for load balancers
                          public_subnet_ids=eks_vpc.public_subnet_ids,
                          # Private subnets will be used for cluster nodes
                          private_subnet_ids=eks_vpc.private_subnet_ids,
                          # ...
                          provider_credential_opts={
                              # a role in the infra account (specified by account_id) with enough permissions to create your infrastructure
                              "role_arn": f"arn:aws:iam::{account_id}:role/OrganizationAccountAccessRole",
                          },
                          opts=pulumi.ResourceOptions(provider=prov)
                          )

I would have liked it best if there would be the option to set environment variables for the tools pulumi uses/executes under the hood, so that I can provide custom AWS env variables for the aws cli that is executed by pulumi.

I hope that helps someone to save some time :-)