kubernetes / cloud-provider-openstack

Apache License 2.0
616 stars 601 forks source link

[occm] Octavia OVN Provider create listener error #1332

Closed bshephar closed 3 years ago

bshephar commented 3 years ago

/kind bug

What happened: Setting lb-provider=ovn works, and allows the user to provision a Octavia OVN LoadBalancer, but breaks while creating the Listener. The configuration looks like so:

[LoadBalancer]
lb-provider=ovn
use-octavia=true
floating-network-id=180941c5-9e82-41c7-b64d-6a57302ec211

However, it fails to create the Listener with the following error:

I1205 11:33:24.169540       1 event.go:291] "Event occurred" object="awx/awx-web-lb" kind="Service" apiVersion="v1" type="Normal" reason="EnsuringLoadBalancer" message="Ensuring load balancer"
E1205 11:33:24.517822       1 controller.go:275] error processing service awx/awx-web-lb (will retry): failed to ensure load balancer: failed to create listener for loadbalancer 6c670e04-7c66-4f1a-9a01-9ad2ba7e2a92: Expected HTTP response code [] when accessing [POST https://openstack.bne-home.net:13876/v2.0/lbaas/listeners], but got 501 instead
{"faultcode": "Server", "faultstring": "Provider 'ovn' does not support a requested option: OVN provider does not support allowed_cidrs option", "debuginfo": null}
I1205 11:33:24.518078       1 event.go:291] "Event occurred" object="awx/awx-web-lb" kind="Service" apiVersion="v1" type="Warning" reason="SyncLoadBalancerFailed" message="Error syncing load balancer: failed to ensure load balancer: failed to create listener for loadbalancer 6c670e04-7c66-4f1a-9a01-9ad2ba7e2a92: Expected HTTP response code [] when accessing [POST https://openstack.bne-home.net:13876/v2.0/lbaas/listeners], but got 501 instead\n{\"faultcode\": \"Server\", \"faultstring\": \"Provider 'ovn' does not support a requested option: OVN provider does not support allowed_cidrs option\", \"debuginfo\": null}"

What you expected to happen: When a operator sets OVN as the provider, we will need to adjust the API request sent to Octavia to ensure the request is valid.

How to reproduce it:

  1. Ensure you have Octavia OVN installed on OpenStack;
  2. Add the above mentioned options to the [LoadBalancer] section in cloud.conf
  3. Create a Service with type: LoadBalancer
  4. Observe the LoadBalancer itself is created, but it will fail while creating the listener, because it's trying to use the allowed_cidr option which is not supported by the OVN provider

Anything else we need to know?: It looks like there are efforts to support this in the ovn_octavia_provider: https://opendev.org/openstack/ovn-octavia-provider/src/branch/master/ovn_octavia_provider/driver.py#L61-L69

But this breaks for Kubernetes users attempting to use it. So I believe we either need to block lb-provider=ovn, or we need to create the listener without allowed_cidrs.

We can consider this a feature request if you prefer. But since it breaks for users, I raised this as a bug. Let me know if you need any additional logs

Environment: Kubernetes environment version is: v1.19.0

Server Version: 4.6.0-0.okd-2020-11-27-200126
Kubernetes Version: v1.19.0-rc.2.1077+43983cda8af930-dirty
lingxiankong commented 3 years ago

I agree we could check lb-provider to only allow octavia(for backward compatibility) and amphora, amphorav2 is still not production ready though.

bshephar commented 3 years ago

I think it is probably best if we check for lb-provider if it's NOT octavia or amphora, then we default to octavia or amphora? That way we aren't breaking if a use does specify something else and still giving our selves a chance at working.

Worst case scenario that I can think of then is that they didn't deploy the Amphora image (Like me who is just using OVN) and Octavia will fail to create the Load Balancer. But at least there wont be a Active load balancer in OpenStack that appears to have worked?

lingxiankong commented 3 years ago

if it's NOT octavia or amphora, then we default to octavia or amphora?

We only need to check when occm is starting up, if lb-provider is not supported, fail and exit.

bshephar commented 3 years ago

Hey,

Sorry, I might have needed to include something extra in my commit message to have it linked to this Issue. Please advise if I'm missing something there. The relevant PR is:

https://github.com/kubernetes/cloud-provider-openstack/pull/1344

And some artifacts from my change:

I1209 03:51:37.075114       1 controllermanager.go:127] Version: v0.0.0-master+$Format:%h$
W1209 03:51:37.075575       1 openstack.go:306] failed to read config: Unsupported LoadBalancer Provider: ovn
F1209 03:51:37.075641       1 controllermanager.go:131] Cloud provider could not be initialized: could not init cloud provider "openstack": Unsupported LoadBalancer Provider: ovn
goroutine 1 [running]: 
bshephar commented 3 years ago

Hmm, not too sure why that test is failing. Is it possible to get more info from this check?

https://logs.openlabtesting.org/logs/44/1344/0b75c75f8b258edf131dea106ed5e4f53c96c7c2/cloud-provider-openstack-acceptance-test-lb-octavia/cloud-provider-openstack-acceptance-test-lb-octavia/7fd0fa1/job-output.txt.gz

2020-12-09 04:27:44.815992 | TASK [Wait for openstack-cloud-controller-manager up and running]
2020-12-09 04:32:37.756204 | ubuntu-bionic | ERROR
2020-12-09 04:32:37.756822 | ubuntu-bionic | {
2020-12-09 04:32:37.756913 | ubuntu-bionic |   "attempts": 24,
2020-12-09 04:32:37.756992 | ubuntu-bionic |   "delta": "0:00:06.699446",
2020-12-09 04:32:37.757115 | ubuntu-bionic |   "end": "2020-12-09 04:32:37.701453",
2020-12-09 04:32:37.757191 | ubuntu-bionic |   "msg": "non-zero return code",
2020-12-09 04:32:37.757264 | ubuntu-bionic |   "rc": 1,
2020-12-09 04:32:37.757373 | ubuntu-bionic |   "start": "2020-12-09 04:32:31.002007"
2020-12-09 04:32:37.757447 | ubuntu-bionic | }

Little more info:

"hosts": {
                        "ubuntu-bionic": {
                            "action": "command",
                            "attempts": 24,
                            "changed": true,
                            "cmd": "set -o pipefail\nsleep 5\nexport KUBECONFIG=/home/zuul/.kube/config\nkubectl -n kube-system get po | grep openstack-cloud-controller-manager | grep Running\n",
                            "delta": "0:00:06.699446",
                            "end": "2020-12-09 04:32:37.701453",
                            "failed": true,
                            "invocation": {
                                "module_args": {
                                    "_raw_params": "set -o pipefail\nsleep 5\nexport KUBECONFIG=/home/zuul/.kube/config\nkubectl -n kube-system get po | grep openstack-cloud-controller-manager | grep Running\n",
                                    "_uses_shell": true,
                                    "argv": null,
                                    "chdir": null,
                                    "creates": null,
                                    "executable": "/bin/bash",
                                    "removes": null,
                                    "stdin": null,
                                    "warn": false,
                                    "zuul_log_id": "fa163ef5-08f0-9bd5-2c7a-00000000001f-ubuntubionic"
                                }
                            },
                            "msg": "non-zero return code",
                            "rc": 1,
                            "start": "2020-12-09 04:32:31.002007",
                            "stderr": "",
                            "stderr_lines": [],
                            "stdout": "",
                            "stdout_lines": [],
                            "zuul_log_id": "fa163ef5-08f0-9bd5-2c7a-00000000001f-ubuntubionic"

Do we collect logs from these pods somewhere?