kubernetes-sigs / cluster-api-provider-openstack

Cluster API implementation for OpenStack
https://cluster-api-openstack.sigs.k8s.io/
Apache License 2.0
289 stars 253 forks source link

Multi floating IPs were spawned if creating cluster without a router or a router without internal interfaces #1827

Closed nguyenhuukhoi closed 8 months ago

nguyenhuukhoi commented 8 months ago

/kind bug

What steps did you take and what happened:

"When I create a cluster without a router or a router without internal interfaces, it will automatically spawn floating IPs until there are no available IPs in this subnet."

What did you expect to happen:

Don't automatically spawn floating IPs if it is without a router or a router without internal interfaces.

Environment:

Cluster API Provider OpenStack version (Or git rev-parse HEAD if manually built): v0.8.0 Cluster-API version: v1.5.4 OpenStack version: Yoga Minikube/KIND version: k8s cluster Kubernetes version (use kubectl version): v1.27.4 OS (e.g. from /etc/os-release): Ubuntu 22.04

jichenjc commented 8 months ago

"When I create a cluster without a router or a router without internal interfaces, it will automatically spawn floating IPs until there are no available IPs in this subnet."

CAPI need floating ip in order to access even there's no router because the API endpoint need to be accessed

nguyenhuukhoi commented 8 months ago

Hello. I mean it will create float ips until full ip in subnet.

On Tue, Jan 16, 2024, 3:43 PM ji chen @.***> wrote:

"When I create a cluster without a router or a router without internal interfaces, it will automatically spawn floating IPs until there are no available IPs in this subnet."

CAPI need floating ip in order to access even there's no router because the API endpoint need to be accessed

— Reply to this email directly, view it on GitHub https://github.com/kubernetes-sigs/cluster-api-provider-openstack/issues/1827#issuecomment-1893291811, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADRLKQO3PGRM5QIP5YEKWQ3YOY4U7AVCNFSM6AAAAABB4KD7RCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOJTGI4TCOBRGE . You are receiving this because you authored the thread.Message ID: <kubernetes-sigs/cluster-api-provider-openstack/issues/1827/1893291811@ github.com>

mdbooth commented 8 months ago

"When I create a cluster without a router or a router without internal interfaces, it will automatically spawn floating IPs until there are no available IPs in this subnet."

CAPI need floating ip in order to access even there's no router because the API endpoint need to be accessed

We do actually support the case where the cluster management network is routable, so a floating IP isn't required.

@ngyenhuukhoi Can you share your configuration? This sounds like a bug, but I'd like to understand your use case better. We also might be able to come up with a workaround for you until it's fixed.

nguyenhuukhoi commented 8 months ago

Hello.

Actually, I create cluster via Magnum Cluster API. This is my issue.

https://github.com/vexxhost/magnum-cluster-api/issues/289

I try to create a cluster via this. If a router without an internal interface then the problem happened.

Nguyen Huu Khoi

On Tue, Jan 16, 2024 at 7:27 PM Matthew Booth @.***> wrote:

"When I create a cluster without a router or a router without internal interfaces, it will automatically spawn floating IPs until there are no available IPs in this subnet."

CAPI need floating ip in order to access even there's no router because the API endpoint need to be accessed

We do actually support the case where the cluster management network is routable, so a floating IP isn't required.

@ngyenhuukhoi https://github.com/ngyenhuukhoi Can you share your configuration? This sounds like a bug, but I'd like to understand your use case better. We also might be able to come up with a workaround for you until it's fixed.

— Reply to this email directly, view it on GitHub https://github.com/kubernetes-sigs/cluster-api-provider-openstack/issues/1827#issuecomment-1893644842, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADRLKQJBUWRKLY6EFEZDBHTYOZW3NAVCNFSM6AAAAABB4KD7RCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOJTGY2DIOBUGI . You are receiving this because you were mentioned.Message ID: <kubernetes-sigs/cluster-api-provider-openstack/issues/1827/1893644842@ github.com>

nguyenhuukhoi commented 8 months ago

Hello.

Router: image

Floating IP

image

kubectl -n magnum-system get cluster -l cluster-uuid=307c3442-db89-438b-a714-2147750ca87c -oyaml

apiVersion: v1
items:
- apiVersion: cluster.x-k8s.io/v1beta1
  kind: Cluster
  metadata:
    creationTimestamp: "2024-01-17T00:26:17Z"
    finalizers:
    - cluster.cluster.x-k8s.io
    generation: 2
    labels:
      cluster-uuid: 307c3442-db89-438b-a714-2147750ca87c
      cluster.x-k8s.io/cluster-name: kube-70vhj
      cni: calico-v3.24.2
      topology.cluster.x-k8s.io/owned: ""
    name: kube-70vhj
    namespace: magnum-system
    resourceVersion: "11035362"
    uid: 0104ebe2-b75e-4f01-a3b1-34dac111335a
  spec:
    clusterNetwork:
      pods:
        cidrBlocks:
        - 10.100.0.0/16
      serviceDomain: cluster.local
      services:
        cidrBlocks:
        - 10.254.0.0/16
    controlPlaneEndpoint:
      host: ""
      port: 0
    controlPlaneRef:
      apiVersion: controlplane.cluster.x-k8s.io/v1beta1
      kind: KubeadmControlPlane
      name: kube-70vhj-njqk2
      namespace: magnum-system
    infrastructureRef:
      apiVersion: infrastructure.cluster.x-k8s.io/v1alpha7
      kind: OpenStackCluster
      name: kube-70vhj-9lhb7
      namespace: magnum-system
    topology:
      class: magnum-v0.13.3
      controlPlane:
        machineHealthCheck:
          enable: false
        metadata:
          labels:
            node-role.kubernetes.io/master: ""
        replicas: 3
      variables:
      - name: apiServerLoadBalancer
        value:
          enabled: true
      - name: apiServerTLSCipherSuites
        value: TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305
      - name: openidConnect
        value:
          clientId: ""
          groupsClaim: ""
          groupsPrefix: ""
          issuerUrl: ""
          usernameClaim: sub
          usernamePrefix: '-'
      - name: auditLog
        value:
          enabled: false
          maxAge: "30"
          maxBackup: "10"
          maxSize: "100"
      - name: bootVolume
        value:
          size: 0
          type: standard
      - name: clusterIdentityRef
        value:
          kind: Secret
          name: kube-70vhj-cloud-config
      - name: cloudCaCert
        value: ""
      - name: cloudControllerManagerConfig
        value: W0dsb2JhbF0KYXV0aC11cmw9aHR0cHM6Ly9tYWdjbG91ZC5mcHQubmV0OjUwMDAKcmVnaW9uPVJlZ2lvbk9uZQphcHBsaWNhdGlvbi1jcmVkZW50aWFsLWlkPWViMzM3MWQxNDdmNjQxZThiNDkxOGNiYjJmZGUwYTA0CmFwcGxpY2F0aW9uLWNyZWRlbnRpYWwtc2VjcmV0PVg5aVFDeVkzQmowOVBRcmdndWd0Ri1tNWxMZWVIV3djaWpjS2hJNGlJeEgtaHo4eWI5aG1SV1VrRlZOUno3TFBjVFRScGJXUGp5RmY4anNMaVQ3aFJRCnRscy1pbnNlY3VyZT1mYWxzZQoK
      - name: containerdConfig
        value: IyBVc2UgY29uZmlnIHZlcnNpb24gMiB0byBlbmFibGUgbmV3IGNvbmZpZ3VyYXRpb24gZmllbGRzLgojIENvbmZpZyBmaWxlIGlzIHBhcnNlZCBhcyB2ZXJzaW9uIDEgYnkgZGVmYXVsdC4KdmVyc2lvbiA9IDIKCmltcG9ydHMgPSBbIi9ldGMvY29udGFpbmVyZC9jb25mLmQvKi50b21sIl0KCltwbHVnaW5zXQpbcGx1Z2lucy4iaW8uY29udGFpbmVyZC5ncnBjLnYxLmNyaSJdCiAgICBzYW5kYm94X2ltYWdlID0gInN5cy1yZWdpc3RyeS5mcHQubmV0L21hZ251bS1rOHMvcGF1c2U6My45IgpbcGx1Z2lucy4iaW8uY29udGFpbmVyZC5ncnBjLnYxLmNyaSIuY29udGFpbmVyZC5ydW50aW1lcy5ydW5jXQogICAgcnVudGltZV90eXBlID0gImlvLmNvbnRhaW5lcmQucnVuYy52MiIKW3BsdWdpbnMuImlvLmNvbnRhaW5lcmQuZ3JwYy52MS5jcmkiLmNvbnRhaW5lcmQucnVudGltZXMucnVuYy5vcHRpb25zXQogICAgU3lzdGVtZENncm91cCA9IHRydWUK
      - name: controlPlaneFlavor
        value: 8vcpus8gbram
      - name: disableAPIServerFloatingIP
        value: false
      - name: dnsNameservers
        value:
        - 8.8.8.8
      - name: externalNetworkId
        value: 24155fe1-4a38-4755-9b50-8a8cd539ee7a
      - name: fixedNetworkName
        value: privatenetwork
      - name: fixedSubnetId
        value: b6f39f63-8ffe-46db-a866-e0a1cbc9329d
      - name: flavor
        value: 8vcpus16gbram
      - name: imageRepository
        value: sys-registry.fpt.net/magnum-k8s
      - name: imageUUID
        value: 68f48182-10c6-45a2-9ca1-68c5bd366f4e
      - name: kubeletTLSCipherSuites
        value: TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305
      - name: nodeCidr
        value: 10.0.0.0/24
      - name: sshKeyName
        value: nhk
      - name: operatingSystem
        value: ubuntu
      version: v1.27.8
      workers:
        machineDeployments:
        - class: default-worker
          failureDomain: dc09-epz
          machineHealthCheck:
            enable: false
          metadata:
            labels:
              node-role.kubernetes.io/worker: ""
              node.cluster.x-k8s.io/nodegroup: default-worker
          name: default-worker
          nodeVolumeDetachTimeout: 5m0s
          replicas: 2
          variables:
            overrides:
            - name: bootVolume
              value:
                size: 0
                type: standard
            - name: flavor
              value: 8vcpus16gbram
            - name: imageRepository
              value: sys-registry.fpt.net/magnum-k8s
            - name: imageUUID
              value: 68f48182-10c6-45a2-9ca1-68c5bd366f4e
  status:
    conditions:
    - lastTransitionTime: "2024-01-17T00:26:18Z"
      message: Scaling up control plane to 3 replicas (actual 0)
      reason: ScalingUp
      severity: Warning
      status: "False"
      type: Ready
    - lastTransitionTime: "2024-01-17T00:26:17Z"
      message: Waiting for control plane provider to indicate the control plane has
        been initialized
      reason: WaitingForControlPlaneProviderInitialized
      severity: Info
      status: "False"
      type: ControlPlaneInitialized
    - lastTransitionTime: "2024-01-17T00:26:18Z"
      message: Scaling up control plane to 3 replicas (actual 0)
      reason: ScalingUp
      severity: Warning
      status: "False"
      type: ControlPlaneReady
    - lastTransitionTime: "2024-01-17T00:26:17Z"
      reason: WaitingForInfrastructure
      severity: Info
      status: "False"
      type: InfrastructureReady
    - lastTransitionTime: "2024-01-17T00:26:18Z"
      status: "True"
      type: TopologyReconciled
    observedGeneration: 2
    phase: Provisioning
kind: List
metadata:
  resourceVersion: "" 

kubectl -n magnum-system get openstackcluster -l cluster.x-k8s.io/cluster-name=$(kubectl -n magnum-system get cluster -l cluster-uuid=307c3442-db89-438b-a714-2147750ca87c -oname | cut -d'/' -f2) -oyaml


apiVersion: v1
items:
- apiVersion: infrastructure.cluster.x-k8s.io/v1alpha7
  kind: OpenStackCluster
  metadata:
    annotations:
      cluster.x-k8s.io/cloned-from-groupkind: OpenStackClusterTemplate.infrastructure.cluster.x-k8s.io
      cluster.x-k8s.io/cloned-from-name: magnum-v0.13.3
    creationTimestamp: "2024-01-17T00:26:17Z"
    finalizers:
    - openstackcluster.infrastructure.cluster.x-k8s.io
    generation: 1
    labels:
      cluster.x-k8s.io/cluster-name: kube-70vhj
      topology.cluster.x-k8s.io/owned: ""
    name: kube-70vhj-9lhb7
    namespace: magnum-system
    ownerReferences:
    - apiVersion: cluster.x-k8s.io/v1beta1
      blockOwnerDeletion: true
      controller: true
      kind: Cluster
      name: kube-70vhj
      uid: 0104ebe2-b75e-4f01-a3b1-34dac111335a
    resourceVersion: "11035410"
    uid: 4d104576-271b-4f6c-83e4-6e66fd815690
  spec:
    allowAllInClusterTraffic: true
    apiServerLoadBalancer:
      enabled: true
    cloudName: default
    controlPlaneEndpoint:
      host: ""
      port: 0
    disableAPIServerFloatingIP: false
    dnsNameservers:
    - 8.8.8.8
    externalNetworkId: 24155fe1-4a38-4755-9b50-8a8cd539ee7a
    identityRef:
      kind: Secret
      name: kube-70vhj-cloud-config
    managedSecurityGroups: true
    network:
      name: privatenetwork
    subnet:
      id: b6f39f63-8ffe-46db-a866-e0a1cbc9329d
kind: List
metadata:
  resourceVersion: "" ````
mdbooth commented 8 months ago

Thanks! Paring this down to just the immediately relevant bits I think we have:

  apiVersion: infrastructure.cluster.x-k8s.io/v1alpha7
  kind: OpenStackCluster
  metadata:
    name: kube-70vhj-9lhb7
  spec:
    apiServerLoadBalancer:
      enabled: true
    disableAPIServerFloatingIP: false
    externalNetworkId: 24155fe1-4a38-4755-9b50-8a8cd539ee7a
    network:
      name: privatenetwork
    subnet:
      id: b6f39f63-8ffe-46db-a866-e0a1cbc9329d

If I've understood correctly there is no router with an interface on both external network 24155fe1-4a38-4755-9b50-8a8cd539ee7a and subnet b6f39f63-8ffe-46db-a866-e0a1cbc9329d? I'm sure you already know this, but for completeness I'll point out that this is a misconfiguration when combined with disableAPIServerFloatingIP: false. The immediate workaround would be to set disableAPIServerFloatingIP: true, assuming you don't need a floating IP because your clients can already route to subnet b6f39f63-8ffe-46db-a866-e0a1cbc9329d.

That doesn't mean it's not a nasty bug, though. We should definitely fix this. I suggest that the simplest fix would be to add the generated floating IP to ControlPlaneEndpoint.Host in between the calls to GetOrCreateFloatingIP and AssociatedFloatingIP here: https://github.com/kubernetes-sigs/cluster-api-provider-openstack/blob/010408dfca1cfe775234f54359bb2ffe524bc5c2/pkg/cloud/services/loadbalancer/loadbalancer.go#L98-L105 This would mean that on the next reconciliation it would attempt (and fail again, with exponential backoff) to associate the same floating IP rather than creating a new one.

nguyenhuukhoi commented 8 months ago

Hello.

Thank you very much for your time.

I get it with your explanation.

Nguyen Huu Khoi

On Wed, Jan 17, 2024 at 6:54 PM Matthew Booth @.***> wrote:

Thanks! Paring this down to just the immediately relevant bits I think we have:

apiVersion: infrastructure.cluster.x-k8s.io/v1alpha7 kind: OpenStackCluster metadata: name: kube-70vhj-9lhb7 spec: apiServerLoadBalancer: enabled: true disableAPIServerFloatingIP: false externalNetworkId: 24155fe1-4a38-4755-9b50-8a8cd539ee7a network: name: privatenetwork subnet: id: b6f39f63-8ffe-46db-a866-e0a1cbc9329d

If I've understood correctly there is no router with an interface on both external network 24155fe1-4a38-4755-9b50-8a8cd539ee7a and subnet b6f39f63-8ffe-46db-a866-e0a1cbc9329d? I'm sure you already know this, but for completeness I'll point out that this is a misconfiguration when combined with disableAPIServerFloatingIP: false, because . The immediate workaround would be to set disableAPIServerFloatingIP: true, assuming you don't need a floating IP because your clients can already route to subnet b6f39f63-8ffe-46db-a866-e0a1cbc9329d.

That doesn't mean it's not a nasty bug, though. We should definitely fix this. I suggest that the simplest fix would be to add the generated floating IP to ControlPlaneEndpoint.Host in between the calls to GetOrCreateFloatingIP and AssociatedFloatingIP here: https://github.com/kubernetes-sigs/cluster-api-provider-openstack/blob/010408dfca1cfe775234f54359bb2ffe524bc5c2/pkg/cloud/services/loadbalancer/loadbalancer.go#L98-L105 This would mean that on the next reconciliation it would attempt (and fail again, with exponential backoff) to associate the same floating IP rather than creating a new one.

— Reply to this email directly, view it on GitHub https://github.com/kubernetes-sigs/cluster-api-provider-openstack/issues/1827#issuecomment-1895655101, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADRLKQMZU3W5V233XFMCVYDYO63WLAVCNFSM6AAAAABB4KD7RCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOJVGY2TKMJQGE . You are receiving this because you were mentioned.Message ID: <kubernetes-sigs/cluster-api-provider-openstack/issues/1827/1895655101@ github.com>