[openstack-cloud-controller-manager] Shared tenancy OpenStack project service type LoadBalancer collision

scrothers commented 1 year ago

/kind bug

What happened: Two Kubernetes clusters in the same project were created. The following steps were taken along with their timeline.

Two different Kubernetes clusters were created using RKE2 in the same OpenStack project.
Both Kubernetes clusters came online and were fully functional.
Kubernetes cluster A had the default:kubernetes service changed from ClusterIP to LoadBalancer.
A load balancer was created and the Kubernetes service was made public successfully.
Kubernetes cluster B had the default:kubernetes service changed from ClusterIP to LoadBalancer.
OCCM identified the first load balancer in Octavia as it's own because the name was kube_service_kubernetes_default_kubernetes and rebuilt it to point to Kubernetes cluster B.
Kubernetes cluster A is now desynchronized and has a LoadBalancer entry which no longer exists.

What you expected to happen: Ideally, I would like to see the load balancer be created with a random suffix in OpenStack (IE: kube_service_kubernetes_default_kubernetes_vxwmx). We use suffixes in other places in Kubernetes, makes sense to use them here too. Further, if a load balancer collision were to exist, instead of deleting/recreating a load balancer of the same name, perhaps an error event should be fired instead if the suffix option is unavailable?

Also, if the occm is going to delete an existing load balancer, there should be a more obvious log entry for this. Currently it just transitions from ACTIVE to PENDING_CREATE in the logs. It's very ambiguous and should have more detailed output.

How to reproduce it:

Create two Kubernetes clusters in OpenStack in the same project, change the default:kubernetes service on each one from ClusterIP to LoadBalancer with a 5 minute gap between the first cluster being fully provisioned.
Check the Octavia API, only one load balancer will exist pointed to the second cluster.

Anything else we need to know?:

Kubernetes cluster A and Kubernetes cluster B have two different Octavia UUIDs, so the LoadBalancer was deleted/created, not simply modified.
Container image: docker.io/k8scloudprovider/openstack-cloud-controller-manager:latest

Environment:

openstack-cloud-controller-manager version: latest
OpenStack version: Zed
Others:

jichenjc commented 1 year ago

Ideally, I would like to see the load balancer be created with a random suffix in OpenStack (IE: kube_service_kubernetes_default_kubernetes_vxwmx). We use suffixes in other places in Kubernetes, makes sense to use them here too. Further,

make sense and I think we should tolerate multiple cluster creation, create a suffix also make sense

if a load balancer collision were to exist, instead of deleting/recreating a load balancer of the same name, perhaps an error event should be fired instead if the suffix option is unavailable?

you mean, we should set a option to enable suffix or not ? this might be complicated ..

@dulek @zetaab FYI

dulek commented 1 year ago

@scrothers: I think our assumption was to use clusterName instead of a random prefix/suffix to distinguish LBs from different clusters. See^1.

This would mean both of your clusters are named "kubernetes" and that's the root of the issue.

The problem is I just checked some OpenShift test runs and for some reason clusterName is "kubernetes" there too. So I'm not sure how to set that clusterName and if it's even possible.

zetaab commented 1 year ago

the problem is that quite many cluster still uses default clusterName which is kubernetes. It is also difficult to migrate away from it.

dulek commented 1 year ago

the problem is that quite many cluster still uses default clusterName which is kubernetes. It is also difficult to migrate away from it.

Okay. And also problem will happen only if you have two services named the same in your clusters, so some folks might be unaware of that. I would think that putting some cluster identifier into the tags could help too. And that is an additive change, so fairly easy to implement for the clusters already running too.

@zetaab: Do you know where do I set that clusterName thing?

zetaab commented 1 year ago

Afaik its startup flag for the occm binary

jichenjc commented 1 year ago

https://github.com/kubernetes/cloud-provider-openstack/blob/master/manifests/controller-manager/openstack-cloud-controller-manager-ds.yaml#L45

https://github.com/kubernetes/cloud-provider-openstack/blob/master/manifests/controller-manager/openstack-cloud-controller-manager-ds.yaml#L66

I think @zetaab is right, seems we should ask for input instead of default one ? or create a suffix and overtime we can handle this kind of issue ?

zetaab commented 1 year ago

I think we should add warning at least if using default value kubernetes

armagankaratosun commented 1 year ago

Hello all,

I also had the same issue, using RKE2 and Openstack CCM within the same project that I have in Openstack. I think the issue here is with the rancher-provided charts, especially rke2-ingress-nginx because if you don't overwrite the name, it will always use the same values for the resources that it creates, ending up with the same service names in all clusters.

rke2-ingress-nginx-controller
rke2-ingress-nginx-admission
rke2-ingress-defaultbackend
.
.
.

And as far as I understood from the behavior of CCM, it uses the notation kube_service_<clusterName>_<serviceName> which in this case will end up with a load balancer (assuming that you are exposing your ingress via an Octavia load balancer);

kube_service_kubernetes_kube-system_rke2-ingress-nginx-controller

and if you have two clusters in the same project, and you want to do the same thing on both of them, then the load balancers will overwrite each other, because they both have the same service name, triggering the reported bug.

One workaround might be overriding the name of the chart, hence changing the service name (serviceName) instead of the cluster (clusterName), because it is simpler, and can be done at any given time.

rke2-ingress-nginx:
    nameOverride: my-different-cluster-ope
    controller:
#     ## Set dnsPolicy to ClusterFirstWithHostNet if you want to set hostNetwork: true and hostPort.enabled: true
      dnsPolicy: ClusterFirstWithHostNet 
      hostNetwork: false
#     ## Use host ports 80 and 443
      hostPort:
        enabled: false
#       ports:
#         http: 80
#         https: 443

which will create (a service in Kubernetes and) a load balancer named kube_service_kubernetes_kube-system_rke2-ingress-nginx-my-different-cluster-ope-controller in Openstack, which is unique.

You might argue, "But Armagan, what about the other services that I have with type: Loadbalancer?". You are right, but you can always set your service names unique, to avoid overwriting your load balancers in Openstack, for example;

apiVersion: v1
kind: Service
metadata:
  name: unique-service-name
spec:
  selector:
    active 'true'
  ports:
    - name: http
      port: 80
      protocol: TCP
      targetPort: 80
  sessionAffinity: None
  type: LoadBalancer

will create a load balancer named kube_service_kubernetes_unique-service-name in Openstack. I know this is not a fix, but a workaround, but it lets things rolling at least.

TL, DR; Just make sure that your service names are unique and you will be fine. Hope it helps!

scrothers commented 1 year ago

TL, DR; Just make sure that your service names are unique and you will be fine. Hope it helps!

@armagankaratosun, absolutely the worst piece of advice. In a world where upstream deploys via helm, and where not all helm charts have that ability, your solution is entirely unfeasible.

Further, if you install something like RKE2 as you imply, greater than 2 times in a single OpenStack project, by default it'll be broken until you remediate by hand afterwards.

I don't think you understand the point of Kubernetes if you believe that manual remediation is an appropriate suggestion to even add to the discussion.

armagankaratosun commented 1 year ago

I don't think you understand the point of Kubernetes if you believe that manual remediation is an appropriate suggestion to even add to the discussion

@scrothers Let's take a step back before judging my understanding of Kubernetes shall we?

Fist of all, where did you read that I suggest manual remediation? As far as I can see, you are jumping into conclusions.

@armagankaratosun, absolutely the worst piece of advice. In a world where upstream deploys via helm, and where not all helm charts have that ability, your solution is entirely unfeasible.

You say that in a world we deploy everything via helm etc, how come I suggest something that not all of the helm charts have. But the point is, both nameOverride and fullnameOverride is something you have almost in all helm charts ( if not all of them, it is in the template after all). So what I suggest is applicable for almost all charts that you can deploy, therefore, it is feasible.

Further, if you install something like RKE2 as you imply, greater than 2 times in a single OpenStack project, by default it'll be broken until you remediate by hand afterwards. .

Again, you are jumping into conclusions. Yes, I do have in fact more than 2 clusters in a single Openstack project. I do not make any manual changes and the clusters are running just fine. You know how? Via changing the chart names with nameOverride like I suggest initially.

Ok this is more about Rancher than OCCM now but you know the rke2 provided add-ons are helm charts and you can overwrite the chart values for those charts via providing yaml manifests, right? And you can customize this before you deploy your cluster, therefore, you will not end up with broken clusters that you need to remediate by hand.

So again, what I am suggesting here is very straightforward. Just ensure that you will have unique service names so that OCCM can create the load balancers without any issue. And I do this via customizing the nameOverride value in the charts (especially the rke2 charts), giving them unique names, adding suffixes like you initially suggested, but not as part of OCCM. Of course it might be better if OCCM can handle this without any intervention, but hey, this is life.

I also said that this is not a solution but a workaround. I hope I explained myself more clearly this time.

dulek commented 1 year ago

This is still a very valid issue though, we cannot expect users to differentiate service names, user might not even know that there are multiple Kubernetes instances running in the same OpenStack project.

The fact that fixing it is non-trivial is a different story.

jichenjc commented 1 year ago

do we know any other Cloud provider such as AWS handle such situation?

even though there are migration issue I Think we can focus on new clusters

how about we add clustername and suffix both so even someone didn't pay attention to the cluster name but we still handle this in a reasonable way?

zetaab commented 1 year ago

AWS uses tags in loadbalancers, so in case of openstack we could use metadata/tags to separate these with clustername.

jichenjc commented 1 year ago

so in case of openstack we could use metadata/tags to separate these with clustername.

but this still need user input .. as we need distinguish multiple k8s in one openstack.. so maybe go with suffix as this makes cluster name trasparent?

armagankaratosun commented 1 year ago

but this still need user input .. as we need distinguish multiple k8s in one openstack.. so maybe go with suffix as this makes cluster name trasparent?

I agree, I think adding a suffix might be better because it will hide this process from the user and will require no user input. With my simple understanding of the code, I think the change needs to be done here; https://github.com/kubernetes/cloud-provider-openstack/blob/master/pkg/openstack/loadbalancer.go#L61

if that is the way forward, I can give it a try and submit a PR.

zetaab commented 1 year ago

https://github.com/kubernetes/cloud-provider-openstack/blob/master/pkg/openstack/loadbalancer.go#L576 clusterName is already part of the loadbalancer name nowadays. However, the biggest problem is when people do use default clusterName which is kubernetes (like in this issue first post). https://github.com/kubernetes/cloud-provider-openstack/blob/master/manifests/controller-manager/openstack-cloud-controller-manager-ds.yaml#L66-L67

In my opinion this is not a bug. Instead this is just weakness that we do not make this "feature" visible. We should even print warning/error to logs that please do not use default value and document it that people can see it. Could we ban that default value? What will happen to old loadbalancers? Will managing those still work? Anyways, minimum step here is to add warning / error level logging when starting occm with kubernetes clusterName

tl;dr RKE2 should populate OCCM --cluster-name parameter correctly. And this project (openstack cloudprovider) should make this issue more visible and perhaps worth of document / add error log if someone really uses default values.

jichenjc commented 1 year ago

ok, with concern above, I agree we should log something first to give some warning but again, still need think about if CPO can help mitigate the problem as (I need check more code about following: whether multiple k8s on one openstack inside multiple projects is reasonable use case, they don't know each other at all ...), so even if we add warning/provide param to help,still the nosiy neighbor impact other running k8s seems a critical thing to handle ..

armagankaratosun commented 1 year ago

In my opinion this is not a bug. Instead this is just weakness that we do not make this "feature" visible. We should even print warning/error to logs that please do not use default value and document it that people can see it. Could we ban that default value? What will happen to old loadbalancers? Will managing those still work? Anyways, minimum step here is to add warning / error level logging when starting occm with kubernetes clusterName

Sorry for jumping in and out of the discussion but from the user's perspective banning the default value kubernetes might be problematic, because it is the default value for most of the Kubernetes distributions out there (e.g, Openshift/OKD, RKE2 etc.). So I am sure people will have some problems if it is blocked by OCCM.

Printing debug/warning messages might be a better fit, and users might be pointed out to changing their service name to end up with something unique (which will guarantee that the LBs will not be overwritten). Like I pointed out in my previous comments.

I think adding a prefix to the name of the load balancer itself might be a valid solution, and as far as I understood from the code, the correct place might be here https://github.com/kubernetes/cloud-provider-openstack/blob/master/pkg/openstack/loadbalancer.go#L445 to end up a load balancer with unique-id_kube_service_<clusterName>_<serviceName>instead of kube_service_<clusterName>_<serviceName>.

zetaab commented 1 year ago

@armagankaratosun one possible problem with unique-id is that there is maximum length for loadbalancer name. I do not have access to openstack environment now and I cannot check is it 64 chars or 255. If its 64, we cannot add unique id

one possibility is to use truncate for instance https://github.com/kubernetes/kops/blob/master/pkg/truncate/truncate.go#L42 if the name is longer than maximum it will use hash instead example: https://github.com/kubernetes/kops/blob/master/pkg/truncate/truncate_test.go#L59-L61

zetaab commented 1 year ago

whether multiple k8s on one openstack inside multiple projects is reasonable use case

afaik original issue is using same openstack project with two different clusters. As I see it is reasonable use case.

armagankaratosun commented 1 year ago

one possible problem with unique-id is that there is maximum length for loadbalancer name. I do not have access to openstack environment now and I cannot check is it 64 chars or 255. If its 64, we cannot add unique id

Hello again @zetaab , fortunately for us, I have an OpenStack environment where I can test if it is 64 or 255 chars.

$ openstack loadbalancer create --name "cpr5iicNRmt8Ny4iLLLCRi30UD3AUxde0cnqnLTMNBdWxKhjN1Q2ZbLtEOuNfNKDNaMsJh4CU7ih2szPMNCghmqZcfaqLeOWqowAfieTpiIjN0t5jfVRpspkGhOkDBRqNR9NJR37EUpDYWcCTUuyJPcdZ3oR5134itArAI2j23vIwQ7jwfHehjCZqN4zm9kzQRRhqlWTwci8RV5sxbxbD4qc3t0l7GEPl62GaT8VupxSRXGon1b0RRQJAnWQXrd" --vip-subnet-id <subnet-id>

+---------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Field               | Value                                                                                                                                                                                                                                                           |
+---------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| admin_state_up      | True                                                                                                                                                                                                                                                            |
| availability_zone   |                                                                                                                                                                                                                                                                 |
| created_at          | 2023-07-04T20:22:09                                                                                                                                                                                                                                             |
| description         |                                                                                                                                                                                                                                                                 |
| flavor_id           | None                                                                                                                                                                                                                                                            |
| id                  |                                                                                                                                                                                                                           |
| listeners           |                                                                                                                                                                                                                                                                 |
| name                | cpr5iicNRmt8Ny4iLLLCRi30UD3AUxde0cnqnLTMNBdWxKhjN1Q2ZbLtEOuNfNKDNaMsJh4CU7ih2szPMNCghmqZcfaqLeOWqowAfieTpiIjN0t5jfVRpspkGhOkDBRqNR9NJR37EUpDYWcCTUuyJPcdZ3oR5134itArAI2j23vIwQ7jwfHehjCZqN4zm9kzQRRhqlWTwci8RV5sxbxbD4qc3t0l7GEPl62GaT8VupxSRXGon1b0RRQJAnWQXrd |
| operating_status    | OFFLINE                                                                                                                                                                                                                                                         |
| pools               |                                                                                                                                                                                                                                                                 |
| project_id          |                                                                                                                                                                                                                                |
| provider            | amphora                                                                                                                                                                                                                                                         |
| provisioning_status | PENDING_CREATE                                                                                                                                                                                                                                                  |
| updated_at          | None                                                                                                                                                                                                                                                            |
| vip_address         |                                                                                                                                                                                                                                                      |
| vip_network_id      |                                                                                                                                                                                                                  |
| vip_port_id         |                                                                                                                                                                                                                             |
| vip_qos_policy_id   | None                                                                                                                                                                                                                                                            |
| vip_subnet_id       |                                                                                                                                                                                                                          |
| tags                |                                                                                                                                                                                                                                                                 |
| additional_vips     |                                                                                                                                                                                                                                                                 |
+---------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

So looks like it is possible to create a load balancer with 255 chars. I actually tried to create a load balancer more than 255 chars and it throw an error as expected;

Invalid input for field/attribute name. Value:
'cpr5iicNRmt8Ny4iLLLCRi30UD3AUxde0cnqnLTMNBdWxKhjN1Q2ZbLtEOuNfNKDNaMsJh4CU7ih2szPMNCghmqZcfaqLeOWqowAfieTpiIjN0t5jfVRpspkGhOkDBRqNR9NJR37EUpDYWcCTUuyJPcdZ3oR5134itArAI2j23vIwQ7jwfHehjCZqN4zm9kzQRRhqlWTwci8RV5sxbxbD4qc3t0l7GEPl62GaT8VupxSRXGon1b0RRQJAnWQXrdasdasdasdasdasdasdasa'. 

Value should have a maximum character requirement of 255 (HTTP 400) (Request-ID: req-b00a4edd-b8b0-4577-86de-cb746a72e5f7)

Hope this helps, please let me know if you want me to try anything else. I think I can hack here https://github.com/kubernetes/cloud-provider-openstack/blob/master/pkg/openstack/loadbalancer.go#L445 and include the unique_id, build the image locally and test it in my OpenStack env., but this might not be the most elegant solution.

mdbooth commented 1 year ago

Something we have considered doing (and may yet do) in CAPO is tagging OpenStack resources with the uid of the k8s object which created them. As these are UUIDs there should be no possibility of conflict, even across different generations of resources with the same name.

dulek commented 1 year ago

We've successfully used LB tags to distinguish clusters with OpenShift + Kuryr. The only problem is that Octavia supports tags starting from OpenStack Stein (API v2.5).

I think tags give use the best flexibility, we can easily filter using them and IMO upgrade story is simpler than in case of names (though renaming an LB seems to be possible too).

k8s-triage-robot commented 10 months ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot commented 9 months ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

dulek commented 8 months ago

/remove-lifecycle rotten

dulek commented 7 months ago

Let me sum up the developments here. I still insist that the correct fix is not in OCCM, but in deployment tools to actually set the --cluster-name to an unique identifier. This obviously raises questions about upgradability, so to aid here I wrote #2552. We have this issue in OpenShift as well and that piece of code should help us to safely switch to pass --cluster-name=<uniq-id> to CCM in the future.

kubernetes / cloud-provider-openstack

[openstack-cloud-controller-manager] Shared tenancy OpenStack project service type LoadBalancer collision #2241