kubernetes-sigs / cluster-api-provider-openstack

Cluster API implementation for OpenStack
https://cluster-api-openstack.sigs.k8s.io/
Apache License 2.0
285 stars 252 forks source link

Add the ability to specify a configurable VIP network for loadbalancer #1809

Closed oblazek closed 5 months ago

oblazek commented 8 months ago

/kind feature

Describe the solution you'd like

Based on an API documentation of openstack/octavia it should be possible to specify a VIP network for the loadbalancer:

Provide a vip_port_id.
Provide a vip_network_id.
Provide a vip_subnet_id.

which is possible in the openstack API

and used in CAPO here, but is strictly filled by this -> openStackCluster.Status.Network.Subnets[0].ID.

Unfortunately this is expecting that loadbalancer is supposed to be created on the same network/subnet as the VM itself which is not always the correct assumption. In our case we have a separate network for the loadbalancer from which the VIP should be allocated and a separate network for the control plane and worker nodes (VMs). This is a setup with octavia and our own specific cloud provider, so there are no amphoras nor floating IPs (the openstack is using calico as network driver).

So either OpenStackCluster and/or OpenStackMachineTemplate should have the possibility to specify which network/subnet ID should be used just for the VIP/loadbalancer separately from OpenStackMachineSpec.network or OpenStackMachineSpec.subnet. E.g. something like, but not strictly in this way:

apiVersion: infrastructure.cluster.x-k8s.io/v1alpha7
kind: OpenStackCluster
metadata:
  name: test2
  namespace: test2
spec:
  cloudName: ulab1-test-infra8
  controlPlaneAvailabilityZones:
  - az1
  - az2
  - az3
  identityRef:
    kind: Secret
    name: test2-cloud-config
  apiServerLoadBalancer:
    enabled: true
+    network: <network_id_or_tag_or_name>
  disableAPIServerFloatingIP: true
  managedSecurityGroups: true
  network:
    name: ulab1
  subnet:
    name: ulab1-ipv4-1

Anything else you would like to add: Partly similar issue is: https://github.com/kubernetes-sigs/cluster-api-provider-openstack/issues/1558 and related thread https://kubernetes.slack.com/archives/CFKJB65G9/p1702562005631379

cc @mdbooth

oblazek commented 8 months ago

similarly to this feature request this is possible in OCCM https://github.com/kubernetes/cloud-provider-openstack/blob/fdba36babb2c4b46e759c99cca50ac7eba2ee06f/pkg/openstack/openstack.go#L103 via configuration:

[LoadBalancer]
lb-provider=labrador
subnet-id=5b4e2c9b-1ab9-4510-87a4-6d2301e16e77
internal-lb=true
provider-requires-serial-api-calls=true
mdbooth commented 8 months ago

To be clear, you want the control plane machines to have 2 attached interfaces:

You want the API servers to listen on the LB VIP network. You want kubelet to listen on the control plane network.

Or do you mean that the control plane machines should only be connected to the control plane network and the LB on the VIP network should add the port on the control plane network, which is a different network, as a member? Does that work?

oblazek commented 8 months ago

Basically the VMs only need 1 interface as they are routable across the whole DC. The VIP network is only a subnet within the routable network.

To be clear, you want the control plane machines to have 2 attached interfaces:

No, they only need to be attached on the Control plane network

Or do you mean that the control plane machines should only be connected to the control plane network

yes

and the LB on the VIP network should add the port on the control plane network, which is a different network as a member

yes, because both networks are routable within the DC (we have a flat L3 network)

oblazek commented 8 months ago

the VIP network (in our case) is not meant to be used as VM interface, but is announced by our L4 balancer (standalone Cilium L4LB) and is BGP routable just like control plane network

oblazek commented 8 months ago

@mdbooth any chance you would accept support for this? If we agree on an implementation I can start right away.

mdbooth commented 8 months ago

I think the API impact will be adding a vipNetwork field taking a network filter under apiServerLoadBalancer. The VIP would be allocated from this network, but members would be added as they are now. That doesn't sound too invasive, or significant additional complexity for the loadbalancer api.

@dulek What are your thoughts on this?

dulek commented 8 months ago

I think the API impact will be adding a vipNetwork field taking a network filter under apiServerLoadBalancer. The VIP would be allocated from this network, but members would be added as they are now. That doesn't sound too invasive, or significant additional complexity for the loadbalancer api.

@dulek What are your thoughts on this?

This looks sane with two modifications I'd suggest:

  1. Ability to choose a subnet too, in case network has multiple subnets.
  2. We should most likely make both fields an array in order to support dual stack LBs use case using additonal_vips functionality. @MaysaMacedo, you might have more thoughts here.
mdbooth commented 8 months ago

I think the API impact will be adding a vipNetwork field taking a network filter under apiServerLoadBalancer. The VIP would be allocated from this network, but members would be added as they are now. That doesn't sound too invasive, or significant additional complexity for the loadbalancer api. @dulek What are your thoughts on this?

This looks sane with two modifications I'd suggest:

  1. Ability to choose a subnet too, in case network has multiple subnets.
  2. We should most likely make both fields an array in order to support dual stack LBs use case using additonal_vips functionality. @MaysaMacedo, you might have more thoughts here.

Good idea! I think only subnets needs to be a list though as from the docs:

Additional VIP subnets must all belong to the same network as the primary VIP.

dulek commented 8 months ago

I think the API impact will be adding a vipNetwork field taking a network filter under apiServerLoadBalancer. The VIP would be allocated from this network, but members would be added as they are now. That doesn't sound too invasive, or significant additional complexity for the loadbalancer api. @dulek What are your thoughts on this?

This looks sane with two modifications I'd suggest:

  1. Ability to choose a subnet too, in case network has multiple subnets.
  2. We should most likely make both fields an array in order to support dual stack LBs use case using additonal_vips functionality. @MaysaMacedo, you might have more thoughts here.

Good idea! I think only subnets needs to be a list though as from the docs:

Additional VIP subnets must all belong to the same network as the primary VIP.

Oh good, it's purely for dual stack, so this simplifies it a lot.

oblazek commented 8 months ago

great thanks, will start with the implementation then

oblazek commented 8 months ago

btw regarding:

Additional VIP subnets must all belong to the same network as the primary VIP.

This is not yet possible as additional_vips are not yet part of gophercloud loadbalancer interface (see https://github.com/gophercloud/gophercloud/pull/2700)

is it ok to do 1x networkID and 1x subnetID for now?

dulek commented 8 months ago

btw regarding:

Additional VIP subnets must all belong to the same network as the primary VIP.

This is not yet possible as additional_vips are not yet part of gophercloud loadbalancer interface (see gophercloud/gophercloud#2700)

is it ok to do 1x networkID and 1x subnetID for now?

We can work to add the required stuff to gophercloud, a single field should be a fairly simple addition and we should be able to get @EmilienM help with merging it.

Nevertheless if you'd like to pursue the simpler case first, I'd still insist that API should have the subnet field as an array even if documented that secondary values are ignored.

oblazek commented 8 months ago

I'd still insist that API should have the subnet field as an array even if documented that secondary values are ignored

yeah, totally agree