kubernetes-sigs / cluster-api-provider-aws

Kubernetes Cluster API Provider AWS provides consistent deployment and day 2 operations of "self-managed" and EKS Kubernetes clusters on AWS.
http://cluster-api-aws.sigs.k8s.io/
Apache License 2.0
648 stars 575 forks source link

Deleting EKS Cluster with BYO VPC when VPC endpoint exists #4426

Open zreigz opened 1 year ago

zreigz commented 1 year ago

/kind bug

What steps did you take and what happened: Create an AKS cluster with a VPC endpoint. Then delete the cluster. All resources are being deleted besides the VPC endpoint.

E0727 10:50:30.780296       1 controller.go:329] "Reconciler error" err=<
    failed to delete vpc "vpc-0565e1af3f7e0fd88": DependencyViolation: The vpc 'vpc-0565e1af3f7e0fd88' has dependencies and cannot be deleted.
        status code: 400, request id: 0388782d-acad-4100-a629-1cf7c0939dee
 > controller="awsmanagedcontrolplane" controllerGroup="controlplane.cluster.x-k8s.io" controllerKind="AWSManagedControlPlane" AWSManagedControlPlane="bootstrap/lukasz-aws" namespace="bootstrap" name="lukasz-aws" reconcileID="ecf2f866-22ba-491c-9def-3b20a30fb3ac

vpc

I also added tags to this endpoint: tags

What did you expect to happen: CAPA controller deletes VPC endpoint

Anything else you would like to add: [Miscellaneous information that will assist in solving the issue.]

Environment:

zreigz commented 1 year ago

My CAPA controller settings:

--feature-gates=EKS=true,EKSEnableIAM=true,EKSAllowAddRoles=true,EKSFargate=false,MachinePool=true,EventBridgeInstanceState=false,AutoControllerIdentityCreator=true,BootstrapFormatIgnition=false,ExternalResourceGC=true,AlternativeGCStrategy=false
Skarlso commented 1 year ago

@zreigz hi 👋

Just curious. How long did you wait? It takes a while to unassign the EIPs and then delete the Security Groups and gateways... Like, 10-15 minutes sometimes.

zreigz commented 1 year ago

I was waiting 30 min. It seems like the CAPA manager does not reconcile the VPC endpoint. I will increase the debug level to get more info

Skarlso commented 1 year ago

Interesting. I tried recently with a full release and everything was deleted fine.

Skarlso commented 1 year ago

ah, wait... This is a BYO cluster... If you brought your own VPC, it shouldn't delete that... It should just say, done and that's it.

zreigz commented 1 year ago

net All resources have tags and are managed correctly. Only this vpce-034d889bf72f85fb8 doesn't have tags

zreigz commented 1 year ago

the main route table also doesn't have tags but was deleted when I manually removed VPC endpoint. Only this one resource blocks deletion

Skarlso commented 1 year ago

If you bring your own cluster, we stopped tagging resources. You have to apply the necessary tags everywhere.

zreigz commented 1 year ago

yes, I did this, but didn't help

Skarlso commented 1 year ago

Can you share what exactly your cluster setup is? What did you bring and what's your cluster config?

zreigz commented 1 year ago

Generally, I create an EKS cluster using Terraform, then I migrate it to the Cluster API. After migration, I can mange the cluster in Cluster API way (add, update, delete resources)

Cluster:

apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
metadata:
  annotations:
    meta.helm.sh/release-name: bootstrap
    meta.helm.sh/release-namespace: bootstrap
  creationTimestamp: "2023-07-27T11:30:11Z"
  finalizers:
  - cluster.cluster.x-k8s.io
  generation: 2
  labels:
    app.kubernetes.io/managed-by: Helm
  name: lukasz-aws
  namespace: bootstrap
  resourceVersion: "12226"
  uid: 4e589e26-154a-42e5-94cc-3903b1b9e2f5
spec:
  clusterNetwork:
    pods:
      cidrBlocks:
      - 10.0.0.0/16
  controlPlaneEndpoint:
    host: https://0E086770AA076364648CB6BB70A253A8.gr7.eu-west-1.eks.amazonaws.com
    port: 443
  controlPlaneRef:
    apiVersion: controlplane.cluster.x-k8s.io/v1beta2
    kind: AWSManagedControlPlane
    name: lukasz-aws
    namespace: bootstrap
  infrastructureRef:
    apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
    kind: AWSManagedCluster
    name: lukasz-aws
    namespace: bootstrap
status:
  conditions:
  - lastTransitionTime: "2023-07-27T11:38:27Z"
    status: "True"
    type: Ready
  - lastTransitionTime: "2023-07-27T11:33:30Z"
    status: "True"
    type: ControlPlaneInitialized
  - lastTransitionTime: "2023-07-27T11:38:27Z"
    status: "True"
    type: ControlPlaneReady
  - lastTransitionTime: "2023-07-27T11:30:12Z"
    status: "True"
    type: InfrastructureReady
  controlPlaneReady: true
  infrastructureReady: true
  observedGeneration: 2
  phase: Provisioned

AWSManagedControlPlane

apiVersion: controlplane.cluster.x-k8s.io/v1beta2
kind: AWSManagedControlPlane
metadata:
  annotations:
    aws.cluster.x-k8s.io/external-resource-gc: "true"
    meta.helm.sh/release-name: bootstrap
    meta.helm.sh/release-namespace: bootstrap
  creationTimestamp: "2023-07-27T11:30:09Z"
  finalizers:
  - awsmanagedcontrolplane.controlplane.cluster.x-k8s.io
  generation: 98
  labels:
    app.kubernetes.io/managed-by: Helm
    cluster.x-k8s.io/cluster-name: lukasz-aws
  name: lukasz-aws
  namespace: bootstrap
  ownerReferences:
  - apiVersion: cluster.x-k8s.io/v1beta1
    blockOwnerDeletion: true
    controller: true
    kind: Cluster
    name: lukasz-aws
    uid: 4e589e26-154a-42e5-94cc-3903b1b9e2f5
  resourceVersion: "12712"
  uid: c8a3aedd-3f91-4407-a4f2-27327234bd84
spec:
  addons:
  - conflictResolution: overwrite
    name: coredns
    version: v1.8.7-eksbuild.4
  - conflictResolution: overwrite
    name: kube-proxy
    version: v1.23.15-eksbuild.1
  - conflictResolution: overwrite
    name: vpc-cni
    version: v1.12.5-eksbuild.1
  associateOIDCProvider: true
  bastion:
    allowedCIDRBlocks:
    - 0.0.0.0/0
    enabled: false
  controlPlaneEndpoint:
    host: https://0E086770AA076364648CB6BB70A253A8.gr7.eu-west-1.eks.amazonaws.com
    port: 443
  eksClusterName: lukasz-aws
  encryptionConfig:
    provider: ""
  endpointAccess:
    private: false
    public: true
  iamAuthenticatorConfig:
    mapRoles:
    - groups:
      - system:masters
      rolearn: arn:aws:iam::312272277431:role/lukasz-aws-capa-controller
      username: capa-admin
  identityRef:
    kind: AWSClusterControllerIdentity
    name: default
  kubeProxy:
    disable: false
  logging:
    apiServer: false
    audit: false
    authenticator: false
    controllerManager: false
    scheduler: false
  network:
    cni:
      cniIngressRules:
      - description: bgp (calico)
        fromPort: 179
        protocol: tcp
        toPort: 179
      - description: IP-in-IP (calico)
        fromPort: -1
        protocol: "4"
        toPort: 65535
    subnets:
    - availabilityZone: eu-west-1b
      cidrBlock: 10.0.102.0/24
      id: subnet-0816b5c889b7a8841
      isPublic: true
      natGatewayId: nat-06312ea0e2caaa620
      routeTableId: rtb-0f51aa8390f3086a3
      tags:
        Name: lukaszz-public-eu-west-1b
        kubernetes.io/cluster/lukasz-aws: owned
        kubernetes.io/role/elb: "1"
        kubernetes.io/role/internal-elb: "1"
        sigs.k8s.io/cluster-api-provider-aws/cluster/lukasz-aws: owned
        sigs.k8s.io/cluster-api-provider-aws/role: public
    - availabilityZone: eu-west-1c
      cidrBlock: 10.0.3.0/24
      id: subnet-046a46a3f124b47ed
      isPublic: false
      routeTableId: rtb-0de4048e5a94c3b32
      tags:
        Name: lukaszz-private-eu-west-1c
        kubernetes.io/cluster/lukasz-aws: owned
        kubernetes.io/role/internal-elb: "1"
        sigs.k8s.io/cluster-api-provider-aws/cluster/lukasz-aws: owned
        sigs.k8s.io/cluster-api-provider-aws/role: private
    - availabilityZone: eu-west-1a
      cidrBlock: 10.0.101.0/24
      id: subnet-0447ce2f91afa28d0
      isPublic: true
      natGatewayId: nat-0bc220f873fd5fb0b
      routeTableId: rtb-0f51aa8390f3086a3
      tags:
        Name: lukaszz-public-eu-west-1a
        kubernetes.io/cluster/lukasz-aws: owned
        kubernetes.io/role/elb: "1"
        kubernetes.io/role/internal-elb: "1"
        sigs.k8s.io/cluster-api-provider-aws/cluster/lukasz-aws: owned
        sigs.k8s.io/cluster-api-provider-aws/role: public
    - availabilityZone: eu-west-1c
      cidrBlock: 10.0.103.0/24
      id: subnet-04a301f988fd7a8d3
      isPublic: true
      natGatewayId: nat-0e7fb6db23ce68519
      routeTableId: rtb-0f51aa8390f3086a3
      tags:
        Name: lukaszz-public-eu-west-1c
        kubernetes.io/cluster/lukasz-aws: owned
        kubernetes.io/role/elb: "1"
        kubernetes.io/role/internal-elb: "1"
        sigs.k8s.io/cluster-api-provider-aws/cluster/lukasz-aws: owned
        sigs.k8s.io/cluster-api-provider-aws/role: public
    - availabilityZone: eu-west-1b
      cidrBlock: 10.0.32.0/20
      id: subnet-0ee788772775a7e12
      isPublic: false
      routeTableId: rtb-02cdb91d592aa34a8
      tags:
        Name: lukaszz-worker-private-eu-west-1b
        kubernetes.io/cluster/lukasz-aws: owned
        kubernetes.io/role/internal-elb: "1"
        sigs.k8s.io/cluster-api-provider-aws/cluster/lukasz-aws: owned
        sigs.k8s.io/cluster-api-provider-aws/role: private
    - availabilityZone: eu-west-1a
      cidrBlock: 10.0.1.0/24
      id: subnet-021df8e1897709a47
      isPublic: false
      routeTableId: rtb-0de4048e5a94c3b32
      tags:
        Name: lukaszz-private-eu-west-1a
        kubernetes.io/cluster/lukasz-aws: owned
        kubernetes.io/role/internal-elb: "1"
        sigs.k8s.io/cluster-api-provider-aws/cluster/lukasz-aws: owned
        sigs.k8s.io/cluster-api-provider-aws/role: private
    - availabilityZone: eu-west-1b
      cidrBlock: 10.0.2.0/24
      id: subnet-02bda737e6e5763a8
      isPublic: false
      routeTableId: rtb-0de4048e5a94c3b32
      tags:
        Name: lukaszz-private-eu-west-1b
        kubernetes.io/cluster/lukasz-aws: owned
        kubernetes.io/role/internal-elb: "1"
        sigs.k8s.io/cluster-api-provider-aws/cluster/lukasz-aws: owned
        sigs.k8s.io/cluster-api-provider-aws/role: private
    - availabilityZone: eu-west-1c
      cidrBlock: 10.0.48.0/20
      id: subnet-0b6bc0646619aa736
      isPublic: false
      routeTableId: rtb-02cdb91d592aa34a8
      tags:
        Name: lukaszz-worker-private-eu-west-1c
        kubernetes.io/cluster/lukasz-aws: owned
        kubernetes.io/role/internal-elb: "1"
        sigs.k8s.io/cluster-api-provider-aws/cluster/lukasz-aws: owned
        sigs.k8s.io/cluster-api-provider-aws/role: private
    - availabilityZone: eu-west-1a
      cidrBlock: 10.0.16.0/20
      id: subnet-081c47674ed849148
      isPublic: false
      routeTableId: rtb-02cdb91d592aa34a8
      tags:
        Name: lukaszz-worker-private-eu-west-1a
        kubernetes.io/cluster/lukasz-aws: owned
        kubernetes.io/role/internal-elb: "1"
        sigs.k8s.io/cluster-api-provider-aws/cluster/lukasz-aws: owned
        sigs.k8s.io/cluster-api-provider-aws/role: private
    vpc:
      availabilityZoneSelection: Ordered
      availabilityZoneUsageLimit: 3
      cidrBlock: 10.0.0.0/16
      id: vpc-097c08803eaf17d68
      internetGatewayId: igw-09a6f49771516c36a
      ipv6:
        cidrBlock: 2a05:d018:14fe:bd00::/56
        egressOnlyInternetGatewayId: eigw-0242549b495fda9cb
        poolId: Amazon
      tags:
        Name: lukasz-aws-vpc
        kubernetes.io/cluster/lukasz-aws: owned
        sigs.k8s.io/cluster-api-provider-aws/cluster/lukasz-aws: owned
        sigs.k8s.io/cluster-api-provider-aws/role: common
  region: eu-west-1
  roleName: lukasz-aws20230727111126186500000004
  sshKeyName: default
  tokenMethod: iam-authenticator
  version: v1.23
  vpcCni:
    disable: false
status:
  addons:
  - arn: arn:aws:eks:eu-west-1:312272277431:addon/lukasz-aws/coredns/a0c4cb8a-c3be-132f-ba23-8a8fdc8a9f8b
    createdAt: "2023-07-27T11:21:58Z"
    modifiedAt: "2023-07-27T11:22:14Z"
    name: coredns
    status: ACTIVE
    version: v1.8.7-eksbuild.4
  - arn: arn:aws:eks:eu-west-1:312272277431:addon/lukasz-aws/kube-proxy/aec4cb8a-c8cc-0cf6-ec74-6f1743ea00f1
    createdAt: "2023-07-27T11:22:00Z"
    modifiedAt: "2023-07-27T11:22:37Z"
    name: kube-proxy
    status: ACTIVE
    version: v1.23.15-eksbuild.1
  - arn: arn:aws:eks:eu-west-1:312272277431:addon/lukasz-aws/vpc-cni/dac4cb8a-c956-e57f-ce21-102c74dbd76e
    createdAt: "2023-07-27T11:22:01Z"
    modifiedAt: "2023-07-27T11:23:08Z"
    name: vpc-cni
    status: ACTIVE
    version: v1.12.5-eksbuild.1
  conditions:
  - lastTransitionTime: "2023-07-27T11:38:55Z"
    message: 8 of 11 completed
    reason: EKSControlPlaneReconciliationFailed
    severity: Error
    status: "False"
    type: Ready
  - lastTransitionTime: "2023-07-27T11:35:06Z"
    message: |-
      failed to describe bastion host: RequestError: send request failed
      caused by: Post "https://ec2.eu-west-1.amazonaws.com/": read tcp 10.0.16.50:48424->54.239.39.230:443: read: connection reset by peer
    reason: BastionHostFailed
    severity: Error
    status: "False"
    type: BastionHostReady
  - lastTransitionTime: "2023-07-27T11:36:44Z"
    status: "True"
    type: ClusterSecurityGroupsReady
  - lastTransitionTime: "2023-07-27T11:33:36Z"
    status: "True"
    type: EKSAddonsConfigured
  - lastTransitionTime: "2023-07-27T11:38:52Z"
    message: |-
      failed reconciling security groups: describing security groups: RequestError: send request failed
      caused by: Post "https://ec2.eu-west-1.amazonaws.com/": read tcp 10.0.16.50:44220->52.95.121.23:443: read: connection reset by peer
    reason: EKSControlPlaneReconciliationFailed
    severity: Error
    status: "False"
    type: EKSControlPlaneReady
  - lastTransitionTime: "2023-07-27T11:33:36Z"
    status: "True"
    type: EKSIdentityProviderConfigured
  - lastTransitionTime: "2023-07-27T11:31:13Z"
    status: "True"
    type: EgressOnlyInternetGatewayReady
  - lastTransitionTime: "2023-07-27T11:38:52Z"
    status: "True"
    type: IAMControlPlaneRolesReady
  - lastTransitionTime: "2023-07-27T11:31:13Z"
    status: "True"
    type: InternetGatewayReady
  - lastTransitionTime: "2023-07-27T11:33:15Z"
    status: "True"
    type: NatGatewaysReady
  - lastTransitionTime: "2023-07-27T11:39:01Z"
    message: |-
      failed to replace outdated route on route table "rtb-02cdb91d592aa34a8": RequestError: send request failed
      caused by: Post "https://ec2.eu-west-1.amazonaws.com/": read tcp 10.0.16.50:53248->52.95.121.23:443: read: connection reset by peer
    reason: RouteTableReconciliationFailed
    severity: Error
    status: "False"
    type: RouteTablesReady
  - lastTransitionTime: "2023-07-27T11:33:37Z"
    message: 'getting client for remote cluster: the server has asked for the client
      to provide credentials'
    reason: SecondaryCidrReconciliationFailed
    severity: Error
    status: "False"
    type: SecondaryCidrsReady
  - lastTransitionTime: "2023-07-27T11:36:44Z"
    status: "True"
    type: SubnetsReady
  - lastTransitionTime: "2023-07-27T11:38:21Z"
    status: "True"
    type: VpcReady
  externalManagedControlPlane: true
  initialized: true
  networkStatus:
    securityGroups:
      cluster:
        id: sg-0b391cce6400beef5
        name: eks-cluster-sg-lukasz-aws-329718774
        tags:
          Name: eks-cluster-sg-lukasz-aws-329718774
          aws:eks:cluster-name: lukasz-aws
          kubernetes.io/cluster/lukasz-aws: owned
          sigs.k8s.io/cluster-api-provider-aws/cluster/lukasz-aws: owned
      node:
        id: sg-0b391cce6400beef5
        name: eks-cluster-sg-lukasz-aws-329718774
        tags:
          Name: eks-cluster-sg-lukasz-aws-329718774
          aws:eks:cluster-name: lukasz-aws
          kubernetes.io/cluster/lukasz-aws: owned
          sigs.k8s.io/cluster-api-provider-aws/cluster/lukasz-aws: owned
      node-eks-additional:
        id: sg-0e982e86434715a6c
        name: lukasz-aws-node-eks-additional
        tags:
          Name: lukasz-aws-node-eks-additional
          sigs.k8s.io/cluster-api-provider-aws/cluster/lukasz-aws: owned
          sigs.k8s.io/cluster-api-provider-aws/role: node-eks-additional
Skarlso commented 1 year ago

AKS

You do mean EKS, right..? :)

zreigz commented 1 year ago

yes :)

Skarlso commented 1 year ago

Right, that should work. Uh.

Ankitasw commented 1 year ago

I think tagging in route table should be fixed, I haven't checked the code, but still it has to be fixed looking at the problem.

/triage accepted /priority important-soon

k8s-triage-robot commented 10 months ago

This issue is labeled with priority/important-soon but has not been updated in over 90 days, and should be re-triaged. Important-soon issues must be staffed and worked on either currently, or very soon, ideally in time for the next release.

You can:

For more details on the triage process, see https://www.kubernetes.dev/docs/guide/issue-triage/

/remove-triage accepted

k8s-triage-robot commented 7 months ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot commented 6 months ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten