pulumi / pulumi-eks

A Pulumi component for easily creating and managing an Amazon EKS Cluster
https://www.pulumi.com/registry/packages/eks/
Apache License 2.0
171 stars 80 forks source link

eks.Cluster may leave dangling ENIs on deletion #779

Open kralicky opened 2 years ago

kralicky commented 2 years ago

What happened?

Deleting a stack containing a VPC times out and cannot be deleted. The following error is given:

     Type                 Name           Status                  Info    
     pulumi:pulumi:Stack  opni-e2e       **failed**              1 error 
 -   └─ aws:ec2:Vpc       opni-e2e-test  **deleting failed**     1 error 

Diagnostics:
  aws:ec2:Vpc (opni-e2e-test):
    error: deleting urn:pulumi:e2e::opni::awsx:ec2:Vpc$aws:ec2/vpc:Vpc::opni-e2e-test: 1 error occurred:                                                                              
        * error deleting EC2 VPC (vpc-***): DependencyViolation: The vpc 'vpc-***' has dependencies and cannot be deleted.
        status code: 400, request id: ***

  pulumi:pulumi:Stack (opni-e2e):
    error: update failed

Resources:

Duration: 5m1s

This is ostensibly caused by the security group attached to the VPC. If I try to delete the VPC in the AWS console, it notes that the security group will also be deleted. image

Upon manually deleting the security group, pulumi is able to delete the VPC successfully.

Steps to reproduce

See here for the relevant code

Expected Behavior

Pulumi should delete the associated security group before deleting the VPC

Actual Behavior

Pulumi cannot delete the VPC, and times out.

Output of pulumi about

CLI                       
Version      3.38.0       
Go Version   go1.19       
Go Compiler  gc           

Plugins                   
NAME        VERSION       
aws         5.10.0        
awsx        1.0.0-beta.9  
docker      3.2.0         
eks         0.41.0        
go          unknown       
kubernetes  3.20.2        
random      4.8.1         

Host                      
OS       arch             
Version  22.0.0           
Arch     x86_64           

Additional context

It appears the security group was created from an EKS ELB.

Contributing

Vote on this issue by adding a 👍 reaction. To contribute a fix for this issue, leave a comment (and link to your pull request, if you've opened one already).

guineveresaenger commented 2 years ago

Hi @kralicky - thank you for reporting. We'll take a look as soon as we can!

jkodroff commented 2 years ago

@kralicky How did you create the security group that was attached to the VPC? If you created it outside of Pulumi, then this is the expected behavior because Pulumi does not delete or modify resources that it did not create.

If you created the security group in Pulumi, please provide a minimal program that reproduces the error.

kralicky commented 2 years ago

I think it's the security group created by an EKS cluster. This is how I create it: https://github.com/rancher/opni/blob/main/infra/pkg/aws/aws.go#L30

Maxim-Durand commented 1 year ago

Hi, I'm facing a similar issue where I cannot destroy my stack because of a security group is not deletable since it's used by an ENI attached to the EC2 instance (worker node) of my EKS Cluster.

The whole cluster (with the security group) was created using:

const eksCluster = new eks.Cluster("eks-cluster", {
    // Put the cluster in the new VPC created earlier
    vpcId: eksVpc.id,
    // Public subnets will be used for load balancers
    publicSubnetIds: eksVpc.publicSubnetIds,
    // Private subnets will be used for cluster nodes
    privateSubnetIds: eksVpc.privateSubnetIds,
    // Change configuration values to change any of the following settings
    instanceType: eksNodeInstanceType,
    desiredCapacity: desiredClusterSize,
    minSize: minClusterSize,
    maxSize: maxClusterSize,
    // Do not give the worker nodes public IP addresses
    nodeAssociatePublicIpAddress: true,
    // Uncomment the next two lines for a private cluster (VPN access required)
    // endpointPrivateAccess: true,
    // endpointPublicAccess: true
    userMappings: aws_auth_configMap
});
flostadler commented 3 weeks ago

This is caused by the aws-node daemonset (vpc-cni) AWS installs on clusters. It assigns manages IP addresses and ENIs of the worker nodes. When shutting down it may not be able to gracefully detach & delete ENIs, leaving some of them dangling and blocking the deletion of the security groups attached to it. In turn this will block the deletion of subnets and VPCs.

A possible option for us to remediate this is deleting the aws-node daemonset before shutting down the node group.