allowedCIDRs breaks when not using the reference neutron implementation due to the api being accessed over public IP

huxcrux commented 7 months ago

/kind bug

What steps did you take and what happened: I try to implement allowedCIDRs for my clusters. doing this by setting allowedCIDRs in the apiServerLoadBalancer in the openstackcluster spec.

Example:

spec:
  allowAllInClusterTraffic: true
  apiServerLoadBalancer:
    allowedCidrs:
    - <listofips-redacted>
    enabled: true
  cloudName: elastx
  controlPlaneEndpoint:
    host: <redacted>
    port: 6443
  disableAPIServerFloatingIP: false
  disableExternalNetwork: false

The problem I face is that we do not use the standard neutron implementation where all nodes are using the routers IP for snat. This means when trying to bootstrap the first node it fails to start the kubelet due to connections to the API being blocked by the loadbalancer.

I could manually add all our snat pools IPs and everything works. the problem with this is that the SNAT IPs are shared between multiple customers. and even If I use the new IPAM code that is under review I would still need to manually add those IPs to the allowedCIDRs list.

What did you expect to happen: I expect all cluster nodes to use the internal LB endpoint for api traffic. It seems a bit odd when there is an internal endpoint to use the external IP for in-cluster traffic?

Anything else you would like to add:

Another alternative is to use the IPAM ippool, the problem for this is that we need to watch another object and trigger an lb reconcile upon. However this object contains a list of valid IPs that could simply be appended. I think it would make more sense to migrate to use the internal endpoint for in-cluster api traffic.

Environment:

Cluster API Provider OpenStack version (Or git rev-parse HEAD if manually built): latest master (commit: 5cc483bfc6eae8a8b8a67b32e9b7af0bafa473ca)
Cluster-API version: 1.6.1
OpenStack version: Ussuri
Minikube/KIND version: kind 0.20.0
Kubernetes version (use kubectl version): 1.29.1
OS (e.g. from /etc/os-release): ubuntu 22.04

mdbooth commented 7 months ago

This is a CAPI limitation (which may in turn be based on a kubeadm limitation?): it is only possible to configure a single control plane endpoint, so it is the public one. I completely agree that it would be ideal to have separate internal and external endpoints, but I don't think there's currently anywhere to configure them.

It's tracked here: https://github.com/kubernetes-sigs/cluster-api/issues/5295

Reading through the comments on that issue and also https://github.com/kubernetes-sigs/cluster-api/pull/8500, it sounds like some other providers may have various degrees of workaround/hack for the issue which it might be worth investigating until we can implement it properly.

k8s-triage-robot commented 4 months ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

huxcrux commented 4 months ago

/remove-lifecycle stale

k8s-triage-robot commented 1 month ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot commented 2 weeks ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

kubernetes-sigs / cluster-api-provider-openstack

allowedCIDRs breaks when not using the reference neutron implementation due to the api being accessed over public IP #1851