pulumi / pulumi-eks

A Pulumi component for easily creating and managing an Amazon EKS Cluster
https://www.pulumi.com/registry/packages/eks/
Apache License 2.0
171 stars 81 forks source link

Promise leaks in v3 release #1466

Open t0yv0 opened 2 weeks ago

t0yv0 commented 2 weeks ago

What happened?

User comment:

This is causing "The Pulumi runtime detected that 951 promises were still active at the time that the process exited." for me, with no easy way to diagnose it.

Example

Need a repro.

Output of pulumi about

N/A

Additional context

No response

Contributing

Vote on this issue by adding a 👍 reaction. To contribute a fix for this issue, leave a comment (and link to your pull request, if you've opened one already).

t0yv0 commented 2 weeks ago

From: https://github.com/pulumi/pulumi-eks/issues/1425#issuecomment-2439417267

t0yv0 commented 2 weeks ago

@ffMathy this is relatively difficult for us to chase down from this description only, do you have any hints on how to reproduce? Apparently also there is this environment variable that may collect more information to help us fix the issue:

 PULUMI_DEBUG_PROMISE_LEAKS=true
ffMathy commented 2 weeks ago

Yes, sorry. I will provide new input with that given environment variable as soon as possible.

ffMathy commented 1 day ago

Here are the leaks:

Promise leak detected:
    CONTEXT(816): rpcKeepAlive
    STACK_TRACE:
    Error:
        at Object.debuggablePromise (/workspaces/REDACTED/node_modules/@pulumi/runtime/debuggable.ts:84:75)
        at Object.rpcKeepAlive (/workspaces/REDACTED/node_modules/@pulumi/runtime/settings.ts:614:25)
        at Object.registerResource (/workspaces/REDACTED/node_modules/@pulumi/runtime/resource.ts:500:18)
        at new Resource (/workspaces/REDACTED/node_modules/@pulumi/resource.ts:556:13)
        at new ComponentResource (/workspaces/REDACTED/node_modules/@pulumi/resource.ts:1228:9)
        at new NodeGroupSecurityGroup (/workspaces/REDACTED/node_modules/@pulumi/nodeGroupSecurityGroup.ts:67:9)
        at REDACTEDComponent.createKubernetesCluster (/workspaces/REDACTED/src/apps/common-iac/src/REDACTED/index.ts:369:31)
        at new REDACTEDComponent (/workspaces/REDACTED/src/apps/common-iac/src/REDACTED/index.ts:45:32)
        at Object.<anonymous> (/workspaces/REDACTED/src/apps/common-iac/pulumi.ts:18:16)
        at Module._compile (node:internal/modules/cjs/loader:1572:14)
        at Object..js (node:internal/modules/cjs/loader:1709:10)
        at Module.load (node:internal/modules/cjs/loader:1315:32)
        at Function._load (node:internal/modules/cjs/loader:1125:12)
        at TracingChannel.traceSync (node:diagnostics_channel:322:14)
        at wrapModuleLoad (node:internal/modules/cjs/loader:216:24)
        at Module.require (node:internal/modules/cjs/loader:1337:12)
        at require (node:internal/modules/helpers:139:16)
        at Object.<anonymous> (/workspaces/REDACTED/node_modules/@pulumi/cmd/run/run.ts:434:33)
        at Generator.next (<anonymous>)
        at fulfilled (/workspaces/REDACTED/node_modules/@pulumi/pulumi/cmd/run/run.js:18:58)
    Promise leak detected:
    CONTEXT(817): resolveURN(resource:node-group-security-group[eks:index:NodeGroupSecurityGroup])
    STACK_TRACE:
    Error:
        at Object.debuggablePromise (/workspaces/REDACTED/node_modules/@pulumi/runtime/debuggable.ts:84:75)
        at /workspaces/REDACTED/node_modules/@pulumi/runtime/resource.ts:738:13
        at Generator.next (<anonymous>)
        at /workspaces/REDACTED/node_modules/@pulumi/pulumi/runtime/resource.js:21:71
        at new Promise (<anonymous>)
        at __awaiter (/workspaces/REDACTED/node_modules/@pulumi/pulumi/runtime/resource.js:17:12)
        at prepareResource (/workspaces/REDACTED/node_modules/@pulumi/pulumi/runtime/resource.js:489:12)
        at Object.registerResource (/workspaces/REDACTED/node_modules/@pulumi/runtime/resource.ts:503:24)
        at new Resource (/workspaces/REDACTED/node_modules/@pulumi/resource.ts:556:13)
        at new ComponentResource (/workspaces/REDACTED/node_modules/@pulumi/resource.ts:1228:9)
        at new NodeGroupSecurityGroup (/workspaces/REDACTED/node_modules/@pulumi/nodeGroupSecurityGroup.ts:67:9)
        at REDACTEDComponent.createKubernetesCluster (/workspaces/REDACTED/src/apps/common-iac/src/REDACTED/index.ts:369:31)
        at new REDACTEDComponent (/workspaces/REDACTED/src/apps/common-iac/src/REDACTED/index.ts:45:32)
        at Object.<anonymous> (/workspaces/REDACTED/src/apps/common-iac/pulumi.ts:18:16)
        at Module._compile (node:internal/modules/cjs/loader:1572:14)
        at Object..js (node:internal/modules/cjs/loader:1709:10)
        at Module.load (node:internal/modules/cjs/loader:1315:32)
        at Function._load (node:internal/modules/cjs/loader:1125:12)
        at TracingChannel.traceSync (node:diagnostics_channel:322:14)
        at wrapModuleLoad (node:internal/modules/cjs/loader:216:24)
        at Module.require (node:internal/modules/cjs/loader:1337:12)
        at require (node:internal/modules/helpers:139:16)
        at Object.<anonymous> (/workspaces/REDACTED/node_modules/@pulumi/cmd/run/run.ts:434:33)
        at Generator.next (<anonymous>)
        at fulfilled (/workspaces/REDACTED/node_modules/@pulumi/pulumi/cmd/run/run.js:18:58)

There are tons more. Hundreds. But that's too large for GitHub. And it just repeats.

flostadler commented 19 hours ago

Thanks @ffMathy!

Could you add your pulumi program, or a minimal repro if you have one, so we can further debug this? It seems like the affected component is eks:index:NodeGroupSecurityGroup, how are you using that one?

I wonder if this is related to this issue here: https://github.com/pulumi/pulumi/issues/13307#issuecomment-1615540170 Are you adding a child to any of the components, outside of the component?

ffMathy commented 19 hours ago

I can't create a repro right now unfortunately. The code base is huge and we are quite busy for Q4 work that needs to be finished.

The code I am using is this:

    const nodeSecurityGroup = new eks.NodeGroupSecurityGroup(
      'node-group-security-group',
      {
        clusterSecurityGroup: securityGroup,
        eksCluster: cluster.eksCluster,
        vpcId: vpc.vpcId,
      },
      {
        parent: this,
      },
    );

Do you need more? Perhaps the VPC, cluster and security group too?

flostadler commented 18 hours ago

Thanks! If you could also add the VPC, cluster and security group it would be great! I'll spend some time trying to reproduce it by using those resources.

ffMathy commented 18 hours ago

More details for everything related to the Node group, security group and cluster.

    const securityGroup = new aws.ec2.SecurityGroup(
      'cluster-security-group',
      {
        namePrefix: 'cluster-security-group-',
        vpcId: vpc.vpcId,
        description: 'REDACTED',
        ingress: [{ protocol: '-1', self: true, fromPort: 0, toPort: 0 }],
        egress: [
          { protocol: '-1', fromPort: 0, toPort: 0, cidrBlocks: ['0.0.0.0/0'] },
        ],
      },
      {
        parent: this,
      },
    );

    const nodeRole = new aws.iam.Role(
      'node-role',
      {
        assumeRolePolicy: JSON.stringify({
          Version: '2012-10-17',
          Statement: [
            {
              Effect: 'Allow',
              Principal: {
                Service: 'ec2.amazonaws.com',
              },
              Action: 'sts:AssumeRole',
            },
          ],
        }),
      },
      {
        parent: this,
      },
    );

    const cluster = new eks.Cluster(
      'cluster',
      {
        vpcId: vpc.vpcId,
        privateSubnetIds: vpc.privateSubnetIds,
        publicSubnetIds: vpc.publicSubnetIds,
        createOidcProvider: true,
        fargate: false,
        skipDefaultNodeGroup: true,
        clusterSecurityGroup: securityGroup,
        roleMappings: [
          {
            roleArn: getAdminRoleArn(),
            groups: ['system:masters'],
            username: 'pulumi:admin-usr',
          },
          {
            roleArn: gitHubActionsRole.arn,
            groups: ['system:masters'],
            username: 'pulumi:admin-usr',
          },
          {
            roleArn: nodeRole.arn,
            username: 'system:node:{{EC2PrivateDNSName}}',
            groups: ['system:bootstrappers', 'system:nodes'],
          },
        ],
      },
      {
        parent: this,
      },
    );

    new aws.iam.RolePolicyAttachment(
      'eks-worker-node-policy',
      {
        policyArn: 'arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy',
        role: nodeRole.name,
      },
      {
        parent: this,
      },
    );
    new aws.iam.RolePolicyAttachment(
      'cni-policy',
      {
        policyArn: 'arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy',
        role: nodeRole.name,
      },
      {
        parent: this,
      },
    );
    new aws.iam.RolePolicyAttachment(
      'ec2-container-registry-read-only-policy',
      {
        policyArn: 'arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly',
        role: nodeRole.name,
      },
      {
        parent: this,
      },
    );

    const nodeSecurityGroup = new eks.NodeGroupSecurityGroup(
      'node-group-security-group',
      {
        clusterSecurityGroup: securityGroup,
        eksCluster: cluster.eksCluster,
        vpcId: vpc.vpcId,
      },
      {
        parent: this,
      },
    );

    const nodeGroup = new aws.eks.NodeGroup(
      'node-group',
      {
        clusterName: cluster.eksCluster.name,
        nodeGroupNamePrefix: 'node-group-',
        nodeRoleArn: nodeRole.arn,
        subnetIds: pulumi
          .all([vpc.privateSubnetIds, vpc.publicSubnetIds])
          .apply(([privateSubnetIds, publicSubnetIds]) =>
            privateSubnetIds.concat(publicSubnetIds),
          ),
        scalingConfig: {
          desiredSize: isEnvironment('production') ? 2 : 1,
          maxSize: isEnvironment('production') ? 4 : 2,
          minSize: isEnvironment('production') ? 1 : 0,
        },
        updateConfig: {
          maxUnavailable: 1,
        },
        instanceTypes: ['t3a.2xlarge'],
      },
      {
        parent: this,
        replaceOnChanges: ['launchTemplate'],
      },
    );

Furthermore, this is the VPC:

new awsx.ec2.Vpc(
      'vpc',
      {
        subnetStrategy: 'Legacy',
        numberOfAvailabilityZones: 2,
        cidrBlock: '10.0.0.0/16',
        enableDnsHostnames: true,
        enableDnsSupport: true,
        tags: {
          Name: 'ftrack-vpc',
        },
        subnetSpecs: [
          {
            type: 'Private',
            tags: {
              // this tag is needed to make AWS Load Balancer Controller work.
              // Read more: https://docs.aws.amazon.com/eks/latest/userguide/aws-load-balancer-controller.html
              'kubernetes.io/role/internal-elb': '1',
            },
          },
          {
            type: 'Public',
            tags: {
              // this tag is needed to make AWS Load Balancer Controller work.
              // Read more: https://docs.aws.amazon.com/eks/latest/userguide/aws-load-balancer-controller.html
              'kubernetes.io/role/elb': '1',
            },
          },
        ],
      },
      {
        parent: this,
        aliases: [
          {
            parent: this,
            name: 'vpc',
          },
        ],
      },
    );