kubernetes / cloud-provider-openstack

Apache License 2.0
619 stars 610 forks source link

[occm] Creating a fully populated LB when Octavia uses ovn-provider may leave pools in PENDING_CREATE #1751

Closed mdbooth closed 2 years ago

mdbooth commented 2 years ago

/kind bug

What happened: This is primary a placeholder and a link to https://bugzilla.redhat.com/show_bug.cgi?id=2042976

We think that OpenStack CCM may be affected by the same issue. While we will work to fix the problem in Octavia, it may be prudent to include a workaround in occm until it's widely deployed.

We haven't verified this at all, although it looks like occm would trigger the same issue: https://github.com/kubernetes/cloud-provider-openstack/blob/ee4ecca326219153805b09ccc95911efd93f58f0/pkg/openstack/loadbalancer.go#L569-L658

We'll try to update this issue when we have a more information.

lingxiankong commented 2 years ago

Thanks for reporting this issue @mdbooth !

OCCM supports amphora octavia driver by default and OVN driver support policy is "do the best but not guarantee", we'd rather the issue is fixed in Octavia side first.

mdbooth commented 2 years ago

I think this might be slightly higher priority for us as I think OVN might be the default in OSP now? Somebody will correct me if I'm wrong.

We're certainly going to fix this in Octavia. My concern is OpenStack updates tend to roll out considerably slower than K8S updates, so we will inevitably have users trying to deploy with OCCM before the Octavia fix is widely deployed. If possible we'd also like to workaround the issue in OCCM until that happens.

Incidentally we're just starting to ramp up our efforts to transition OpenShift on OpenStack to use OCCM by default in the near future, and we're obviously going to back that up with development and maintenance. My hope is that we can upgrade that 'do the best but not guarantee' ourselves by working on it. We're certainly intending to do the work for this issue specifically ourselves.

dulek commented 2 years ago

I think this might be slightly higher priority for us as I think OVN might be the default in OSP now? Somebody will correct me if I'm wrong.

OCCM doesn't do autodetection of available providers, so by default it will use Amphora regardless. You need to configure OVN manually. That being said, OVN-provider is superior in terms resources conservation and time to bring the LB up.

We're certainly going to fix this in Octavia. My concern is OpenStack updates tend to roll out considerably slower than K8S updates, so we will inevitably have users trying to deploy with OCCM before the Octavia fix is widely deployed. If possible we'd also like to workaround the issue in OCCM until that happens.

The fix is upstream now: https://review.opendev.org/c/openstack/ovn-octavia-provider/+/826257/, but I agree with @mdbooth view here, it'll be a while before we'll see it in the field.

Incidentally we're just starting to ramp up our efforts to transition OpenShift on OpenStack to use OCCM by default in the near future, and we're obviously going to back that up with development and maintenance. My hope is that we can upgrade that 'do the best but not guarantee' ourselves by working on it. We're certainly intending to do the work for this issue specifically ourselves.

jichenjc commented 2 years ago

If possible we'd also like to workaround the issue in OCCM until that happens.

I knew a lot of openstack vendors/users do less upgrade due to the private patches and codes so I think it's likely the changes to latest openstack won't help them a lot

My hope is that we can upgrade that 'do the best but not guarantee' ourselves by working on it.

+1 to this if OCCM can help fix or at least mitigate the problem

k8s-triage-robot commented 2 years ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

dulek commented 2 years ago

/remove-lifecycle stale

Shugotekfr commented 2 years ago

I have same problem on Openstack Xena Version with octavia amphora provider. NOK : Kubernetes 1.23, OCCM 1.23, Openstack Xena OK : Kubernetes 1.23, OCCM 1.23, Openstack Victoria

I currently check octavia Logs, if i try to create LB manually from openstack GUI, it's create LB and send creation of pool/pool member/listener after. If i launch creation of LB from K8S, Logs only receive request for LB.

k8s-triage-robot commented 2 years ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot commented 2 years ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot commented 2 years ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-ci-robot commented 2 years ago

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to [this](https://github.com/kubernetes/cloud-provider-openstack/issues/1751#issuecomment-1272111066): >The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. > >This bot triages issues according to the following rules: >- After 90d of inactivity, `lifecycle/stale` is applied >- After 30d of inactivity since `lifecycle/stale` was applied, `lifecycle/rotten` is applied >- After 30d of inactivity since `lifecycle/rotten` was applied, the issue is closed > >You can: >- Reopen this issue with `/reopen` >- Mark this issue as fresh with `/remove-lifecycle rotten` >- Offer to help out with [Issue Triage][1] > >Please send feedback to sig-contributor-experience at [kubernetes/community](https://github.com/kubernetes/community). > >/close not-planned > >[1]: https://www.kubernetes.dev/docs/guide/issue-triage/ Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.