kubernetes-sigs / cluster-api-provider-aws

Kubernetes Cluster API Provider AWS provides consistent deployment and day 2 operations of "self-managed" and EKS Kubernetes clusters on AWS.
http://cluster-api-aws.sigs.k8s.io/
Apache License 2.0
643 stars 569 forks source link

Deleting a cluster with unmanaged subnets doesn't clean up the tags in the Subnet AWS Resource resulting into clusters being unable to be provisioned #3074

Open angelos-p opened 2 years ago

angelos-p commented 2 years ago

/kind bug

What steps did you take and what happened: I have a usecase where I need to create and delete 10s of clusters with unmanaged subnets every day and one day I noticed that my clusters stopped being able to be provisioned with the following error:

E0111 15:03:44.323375       1 controller.go:317] controller/awsmachine "msg"="Reconciler error" "error"="failed to create AWSMachine instance: failed to run machine \"aws-v1.22-763b6751-c803-48bb-bb09-cc57b5c450f0-control-plamntff\", no subnets available in availability zone \"eu-west-2b\"" "name"="aws-v1.22-763b6751-c803-48bb-bb09-cc57b5c450f0-control-plamntff" "namespace"="default" "reconciler group"="infrastructure.cluster.x-k8s.io" "reconciler kind"="AWSMachine" 

Which didn't make sense because the subnets were all working fine. After further investigation I realized that CAPI was attempting to create tags on the Subnet resources and that request was failing because the tag limit of 50 was reached. I only had 1 cluster provisioned at that time and the tags on the Subnet included all the past clusters that I had created. Here's an example from today:

image

The AWSCluster object also included the tags of the deleted clusters in the .spec.network.subnets stanza:

        cidrBlock: ******
        id: ******
        isPublic: false
        routeTableId: ******
        tags:
          Environment: shared_services
          Management: Terraform
          Name: Testing Private Subnet 2
          Repo: shared-infrastructure
          Scope: Private
          TF_Directory: shared_services_network
          kubernetes.io/cluster/1.23-243ebba1-250e-4261-ae37-1453024e0c8c: shared
          kubernetes.io/cluster/02b3df93-c93e-4b39-a12b-75765383a5a9: shared
          kubernetes.io/cluster/aws-v1.22-1f2985bc-dfeb-4eb7-8e99-a0d3bbc400d0: shared
          kubernetes.io/cluster/aws-v1.22-7d2ebe0d-42e3-49f5-a74b-e1f3b222954d: shared
          kubernetes.io/cluster/aws-v1.22-9926f379-2c26-4840-aea3-9305beabfa72: shared
          kubernetes.io/cluster/fa3b783c-a611-40b0-95f9-7116e356e999: shared
          kubernetes.io/cluster/lewis-test-cluster: shared
          kubernetes.io/cluster/lewistest: shared
          kubernetes.io/cluster/lewistest2: shared
          kubernetes.io/cluster/lewistest3: shared
          kubernetes.io/cluster/lewistest4: shared
          kubernetes.io/role/internal-elb: "1"

My next thought was deleting the tags on the Subnets manually, but when Reconciling subnets was triggered by the capa-controller-manager, the tags were readded to the Subnets. Deleting them from the AWSCluster object using kubectl edit also didn't work because they were readded the same way. The only way I managed to delete the tags was by deleting them from both the subnets and the AWSCluster object in quick succession before the Reconciling subnets got triggered again.

Here's my recreation steps:

What did you expect to happen:

Anything else you would like to add: My temporary solution to this is creating a lambda that deletes the tags on the Subnet Resources and the AWSCluster object in quick succession every few hours, but that is far from ideal.

Environment:

sedefsavas commented 2 years ago

Strange, CAPA does not share tags between clusters. Can you provide the yamls used for creating cluster/lewistest and cluster/lewistest2? I just want to see how they are created to understand how this happened.

angelos-p commented 2 years ago

Hi @sedefsavas, thank you for looking into this!

I clarified my original post a bit, CAPA indeed doesn't share tags between clusters directly, but it syncs their tags with the Subnet AWS Resource which in turn shares them indirectly.

So if a cluster (A) is created, the new cluster tag (A=shared) is added to the Subnet Resource. Then the cluster will sync the rest of the Subnet tags (tags like Scope=Private etc) when Reconcile subnets gets triggered. Then a new cluster (B) gets created and its cluster tag (B=shared) is added to the Subnet Resource, when 'Reconcile subnets' gets triggered again, both clusters will sync the Subnet's tags with their own which at that point also includes the tags of the other cluster. Resulting in the Subnet and the 2 clusters having both of the tags: A=shared and B=shared.

I believe that when a cluster is syncing tags with the Subnet Resource, it should ignore tags from other CAPI created clusters to avoid the tag inheritance issue.

Then if a cluster gets deleted, it should remove its tags from the Subnet Resource.

If both of the above functionalities are added, the bug would be fixed.

The yamls are templated so I will send you the template:

# {{ .Uuid }}
---
apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
metadata:
  name: aws-v1.22-{{ .Uuid }}
  namespace: {{ .Namespace }}
spec:
  clusterNetwork:
    pods:
      cidrBlocks:
        - ******
  controlPlaneRef:
    apiVersion: controlplane.cluster.x-k8s.io/v1beta1
    kind: KubeadmControlPlane
    name: aws-v1.22-{{ .Uuid }}-control-plane
  infrastructureRef:
    apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
    kind: AWSCluster
    name: aws-v1.22-{{ .Uuid }}
---
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: AWSCluster
metadata:
  name: aws-v1.22-{{ .Uuid }}
  namespace: {{ .Namespace }}
spec:
  network:
    vpc:
      id: *******
    subnets:
      - id: *******
      - id: *******
      - id: *******
    securityGroupOverrides:
      bastion: *******
      controlplane: *******
      apiserver-lb: *******
      node: *******
      lb: *******
  region: eu-west-2
  sshKeyName: *******
  controlPlaneLoadBalancer:
    scheme: internal
    subnets:
      - *******
      - *******
      - *******
---
apiVersion: controlplane.cluster.x-k8s.io/v1beta1
kind: KubeadmControlPlane
metadata:
  name: aws-v1.22-{{ .Uuid }}-control-plane
  namespace: {{ .Namespace }}
spec:
  kubeadmConfigSpec:
    postKubeadmCommands:
      - sudo kubectl --kubeconfig=/etc/kubernetes/kubelet.conf apply -f https://docs.projectcalico.org/v3.20/manifests/calico.yaml
    clusterConfiguration:
      apiServer:
        extraArgs:
          cloud-provider: aws
      controllerManager:
        extraArgs:
          cloud-provider: aws
    initConfiguration:
      nodeRegistration:
        kubeletExtraArgs:
          cloud-provider: aws
        name: '{{"{{"}} ds.meta_data.local_hostname {{"}}"}}'
    joinConfiguration:
      nodeRegistration:
        kubeletExtraArgs:
          cloud-provider: aws
        name: '{{"{{"}} ds.meta_data.local_hostname {{"}}"}}'
  machineTemplate:
    infrastructureRef:
      apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
      kind: AWSMachineTemplate
      name: aws-v1.22-{{ .Uuid }}-control-plane
  replicas: 3
  version: v1.22.5
---
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: AWSMachineTemplate
metadata:
  name: aws-v1.22-{{ .Uuid }}-control-plane
  namespace: {{ .Namespace }}
spec:
  template:
    spec:
      iamInstanceProfile: *******
      instanceType: t3.large
      sshKeyName: *******
      failureDomain: "eu-west-2a"

---
apiVersion: cluster.x-k8s.io/v1beta1
kind: MachineDeployment
metadata:
  name: aws-v1.22-{{ .Uuid }}-md-0
  namespace: {{ .Namespace }}
spec:
  clusterName: aws-v1.22-{{ .Uuid }}
  replicas: 3
  selector:
    matchLabels: null
  template:
    spec:
      bootstrap:
        configRef:
          apiVersion: bootstrap.cluster.x-k8s.io/v1beta1
          kind: KubeadmConfigTemplate
          name: aws-v1.22-{{ .Uuid }}-md-0
      clusterName: aws-v1.22-{{ .Uuid }}
      infrastructureRef:
        apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
        kind: AWSMachineTemplate
        name: aws-v1.22-{{ .Uuid }}-md-0
      version: v1.22.5
---
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: AWSMachineTemplate
metadata:
  name: aws-v1.22-{{ .Uuid }}-md-0
  namespace: {{ .Namespace }}
spec:
  template:
    spec:
      iamInstanceProfile: *******
      instanceType: t3.large
      sshKeyName: *******
      failureDomain: "eu-west-2a"
---
apiVersion: bootstrap.cluster.x-k8s.io/v1beta1
kind: KubeadmConfigTemplate
metadata:
  name: aws-v1.22-{{ .Uuid }}-md-0
  namespace: {{ .Namespace }}
spec:
  template:
    spec:
      preKubeadmCommands:
        - sudo apt -y update
        - sudo apt -y install linux-modules-extra-$(uname -r)
      joinConfiguration:
        nodeRegistration:
          kubeletExtraArgs:
            cloud-provider: aws
          name: '{{"{{"}} ds.meta_data.local_hostname {{"}}"}}'
sedefsavas commented 2 years ago

Now makes sense, thanks for giving more info. To summarize, when using your own VPC and bringing up clusters on the same VPC, the subnet tags do not get cleaned up after clusters are deleted.

/triage accepted /priority important-soon

pydctw commented 2 years ago

/assign

pydctw commented 2 years ago

Waiting for #3123 to be merged.

angelos-p commented 2 years ago

Hi, is there any update about the above issue? Is it possible for me to help with the PR in any way? :slightly_smiling_face:

k8s-triage-robot commented 2 years ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

voor commented 2 years ago

/remove-lifecycle stale

This error is still relevant to my interests as well.

k8s-triage-robot commented 2 years ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot commented 1 year ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot commented 1 year ago

The issue has been marked as an important bug and triaged. Such issues are automatically marked as frozen when hitting the rotten state to avoid missing important bugs.

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle frozen

k8s-triage-robot commented 1 year ago

This issue is labeled with priority/important-soon but has not been updated in over 90 days, and should be re-triaged. Important-soon issues must be staffed and worked on either currently, or very soon, ideally in time for the next release.

You can:

For more details on the triage process, see https://www.kubernetes.dev/docs/guide/issue-triage/

/remove-triage accepted