pulumi / pulumi-eks

A Pulumi component for easily creating and managing an Amazon EKS Cluster
https://www.pulumi.com/registry/packages/eks/
Apache License 2.0
169 stars 79 forks source link

pulumi-eks v2.2.1 installs cni v1.11 components #1087

Open bothra90 opened 4 months ago

bothra90 commented 4 months ago

What happened?

This is related to https://github.com/pulumi/pulumi-eks/issues/1057. When creating a new EKS cluster using @pulumi/eks: "^2.2.1", we still notice that the version of the CNI installed is still v1.11, which causes the aws-node pods to go into a crash loop.

Example

We create the cluster using the following snippet:

    const cluster = new eks.Cluster('eks-cluster', {
        vpcId: input.vpcId,
        endpointPrivateAccess: true,
        endpointPublicAccess: false,
        publicSubnetIds: input.publicSubnets,
        privateSubnetIds: input.privateSubnets,
        version: k8sVersion,
        providerCredentialOpts: {
            roleArn: input.adminRoleArn,
        },
        roleMappings: eksRoleMappings,
        instanceRole: instanceRole,
        serviceRole: serviceRole,
        nodeAssociatePublicIpAddress: false,
        createOidcProvider: true,
        skipDefaultNodeGroup: true,
        enabledClusterLogTypes: ["api", "audit", "authenticator", "controllerManager", "scheduler"],
        encryptionConfigKeyArn: secretsEncryptionKey.arn,
    }, { provider: awsProvider, protect: true });

Output of pulumi about

Dependencies:
NAME                VERSION
@pulumi/aws         6.23.0
@pulumi/eks         2.2.1
@pulumi/postgresql  3.10.1
fp-ts               2.16.2
netmask             2.0.2
typescript          5.3.3
vite                5.0.12
@pulumi/cloudflare  5.19.0
@pulumi/random      4.15.1
@pulumi/tls         4.11.1
@types/netmask      1.0.30
cidr-tools          6.4.2
ts-node             10.9.2
vitest              0.33.0
@pulumi/docker      4.5.1
@pulumi/kafka       3.6.0
@types/glob         8.1.0
@types/jsbn         1.2.33
immer               10.0.3
ts-pattern          5.0.6
@pulumi/command     0.5.2
@pulumi/kubernetes  4.7.1
@pulumi/pulumi      3.107.0
glob                10.3.10
ts-md5              1.3.1

Additional context

No response

Contributing

Vote on this issue by adding a 👍 reaction. To contribute a fix for this issue, leave a comment (and link to your pull request, if you've opened one already).

rquitales commented 3 months ago

@bothra90 v2.3.0 of the provider was just released. Could you check and see if this resolves this issue? Thanks!

bradyburke commented 3 months ago

Still seeing the same issue on 2.3.0:

❯ npm ls | grep eks
├── @aws-sdk/client-eks@3.525.0
├── @pulumi/eks@v2.3.0

Output from the daemon set:

❯ k get ds aws-node -o yaml | grep image:
        image: 602401143452.dkr.ecr.us-west-2.amazonaws.com/amazon-k8s-cni:v1.11.0
        image: 602401143452.dkr.ecr.us-east-1.amazonaws.com/amazon/aws-network-policy-agent:v1.0.4-eksbuild.1
        image: 602401143452.dkr.ecr.us-west-2.amazonaws.com/amazon-k8s-cni-init:v1.11.0
rquitales commented 3 months ago

@bradyburke Thanks for the update. I'm not able to reproduce this on my side unfortunately. If you could try a few things, that might be helpful for me to further debug.

  1. When upgrading to v2.3.0 of the provider and running pulumi up, do you notice any changes to the configmap and VpcCni resources?
  2. Would you be able to provide the full log of a pulumi up with v2.3.0 of the provider, and where v1.11.0 of the CNI is installed?

Thanks

bradyburke commented 3 months ago

@rquitales I tested on a fresh cluster and it seems to install the correct versioning. Previously, I had been testing an in place update - I will test again and provide details

bsod90 commented 3 months ago

Since I spent the last 3 days debugging a similar issue, I'm just going to leave my findings here, maybe it'll help somebody else. So, in my case, it was installing v1.11.0 no matter what - I tried clearing node_modules, /tmp, upgrading all providers, etc. By setting process.exit() in random places of my eks provider I figured out it does use things like the Cluster resource from my node_modules, but eks:index:VpcCni gets loaded from somewhere else. That "somewhere else" turned out to be ~/.pulumi/plugins - for some reason it kept an old version of the eks plugin there and was using it when referencing the eks:index:VpcCni. Dropping that folder and re-installing all plugins helped.

Hi, @bothra90

rquitales commented 3 months ago

@bsod90 Thanks for the detailed investigations on this! This might be somewhat related to versioning issues reported in #1125. We'll need to dive deeper into how node-based providers handle version upgrades as well.