[occm] Multi region support

sergelogvinov commented 2 years ago

/kind feature

I am noticed that openstack-client supports multi-region config. https://github.com/gophercloud/utils/blob/master/openstack/clientconfig/testing/clouds.yaml#L160-L171

What do you think to add the multi-region support in one OCCM, based on the config-file, after #1900 ? If the config file has a regions tree, OCCM checks it at boot time and will watch the nodes in those regions.

jichenjc commented 2 years ago

I think it's valid.. not sure such use case:

basically, OCCM is responsible for create LB let's assume in a multi region env we might want LB created in desired region？

sergelogvinov commented 2 years ago

Yep. You are right. I forgot about LB. I spoke about node/node-lifecycle only.

For LB, it can be done by service annotation property...

So, we can introduce it in steps. Unfortunately, I do not have regional openstack with LB. It will be hard to test.

mikejoh commented 2 years ago

We've just implemented our own cloud-provider with a multi-region aware cloud controller manager on OpenStack. At the moment it only covers the node and node-lifecyle controllers. Interesting about the multi-region configuration you can add to the clouds.yaml, i wonder if it's possible to query for resources across multiple regions in one go or if you still need to kind-of loop through the regions one by one.

sergelogvinov commented 2 years ago

If the node uninitialized, we need to check all regions to fine the node. If node has providerId, we can get the region name from it.

and if i right, we need to have go-client connection for each region.

mikejoh commented 2 years ago

If the node uninitialized, we need to check all regions to fine the node. If node has providerId, we can get the region name from it.

and if i right, we need to have go-client connection for each region.

Correct! One compute ServiceClient per region basically. We're not completely done with our implementation, it's still in PoC mode, but so far everything looks good. We made sure to implement the k8s.io/cloud-provider InstancesV2 interface which is slimmer and don't require as many queries against the underlying cloud APIs compared to the Instances interface.

As a side note: I know that there's a todo on implementing the V2 interface in the OCCM, which I think we could help out with!

jichenjc commented 2 years ago

As a side note: I know that there's a todo on implementing the V2 interface in the OCCM, which I think we could help out with!

right, I did some work but distracted by too many other stuffs ..so if you can help that will be terrific

commit 5a5030e83fd72838cddb075bef19e46ba999676e
Author: ji chen <jichenjc@cn.ibm.com>
Date:   Fri Apr 15 19:15:10 2022 +0800

    refactory code to prepare instance V2 implementation (#1823)

k8s-triage-robot commented 1 year ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

sergelogvinov commented 1 year ago

/remove-lifecycle stale

k8s-triage-robot commented 1 year ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot commented 1 year ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

sergelogvinov commented 1 year ago

/remove-lifecycle rotten

k8s-triage-robot commented 1 year ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

sergelogvinov commented 1 year ago

/remove-lifecycle stale

k8s-triage-robot commented 12 months ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

sergelogvinov commented 11 months ago

/remove-lifecycle stale

k8s-triage-robot commented 5 months ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Hybrid512 commented 5 months ago

What about this feature ? This would clearly be needed in our use case where we have 3 Openstack standalone clusters and want to have kubernetes clusters stretched on top of them. This can be done but things get complicated since CCM can't handle more than 1 Openstack API endpoint and thus, this makes everything related to storage or LBs a hell.

Is this somewhere in the pipe or maybe there are some alternatives ? Any help would be gladly appreciated.

dulek commented 4 months ago

What about this feature ? This would clearly be needed in our use case where we have 3 Openstack standalone clusters and want to have kubernetes clusters stretched on top of them. This can be done but things get complicated since CCM can't handle more than 1 Openstack API endpoint and thus, this makes everything related to storage or LBs a hell.

Is this somewhere in the pipe or maybe there are some alternatives ? Any help would be gladly appreciated.

I don't think CPO was designed with that use case in mind and it might need a lot of work to get right. Happy to help with advice and reviews if you have some development resources to throw at this.

Interestingly - what's the use case to stretch K8s like that?

k8s-triage-robot commented 3 months ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

Hybrid512 commented 3 months ago

What about this feature ? This would clearly be needed in our use case where we have 3 Openstack standalone clusters and want to have kubernetes clusters stretched on top of them. This can be done but things get complicated since CCM can't handle more than 1 Openstack API endpoint and thus, this makes everything related to storage or LBs a hell. Is this somewhere in the pipe or maybe there are some alternatives ? Any help would be gladly appreciated.

I don't think CPO was designed with that use case in mind and it might need a lot of work to get right. Happy to help with advice and reviews if you have some development resources to throw at this.

Interestingly - what's the use case to stretch K8s like that?

In our case, the use cae is to have multi-AZ kubernetes clusters on top of multiple standalone Openstack clusters. Basically, have one k8s cluster stretched on 3 different Openstack tenants, each in a standalone Openstack cluster. Our k8s strategy is to have multiple k8s cluster for better ressource and security segmentation with a high level of automation. Our Openstack clusters have routed networks but they are not streteched clusters themselves and they have their own dedicated storage that is not replicated to the other ones. The idea behind that is that, since Openstack is the base supporting every k8s cluster we have, we rather secure Openstack operations by not stretching the Openstack itself (and instead, have multiple standalone clusters) so that failure (would it be hardware or security issue or even a simple human error) would impact only one AZ at a time and since our k8s clusters are stretched on top of these Openstack clusters, loosing an AZ would have nearly zero impact on the application layer.

However, I see there are PRs for this and we already try to help the best we can in this way so fingers crossed !

/remove-lifecycle rotten

sergelogvinov commented 3 months ago

Thank you for your feedback.

I did research, and made the lab based on proxmox-vm. CCM/CSI work fine. One pvc-template resource works well on different AZ too, https://github.com/sergelogvinov/proxmox-csi-plugin

I steal believe that I will repeat the same idea in this project. But only for CCM/CSI. I do not have LB in my openstack setup. And many loadbalancers/ingress-controllers do not work with GEO (region/zonal) balancing knowledge (serve traffic only in one zone/region).

make a vote 👍 please. it helps to contributors to understand how important this feature.

k8s-triage-robot commented 3 weeks ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

kubernetes / cloud-provider-openstack

[occm] Multi region support #1924