kubernetes / cloud-provider-openstack

Apache License 2.0
618 stars 609 forks source link

[occm] Cannot access loadbalancer from non kubernetes nodes #1877

Closed sykim-etri closed 2 years ago

sykim-etri commented 2 years ago

Is this a BUG REPORT or FEATURE REQUEST?:

What happened: At first, I set up Openstack (xena) with octavia into openstack-server machine. I tested a loadbalancer is working well.

And then I installed kubernetes cluster(v1.22.8) by kubespray in 2 VMs which are created on above Openstack.

NAME            STATUS   ROLES                  AGE   VERSION
octavia-k8s-1   Ready    control-plane,master   18h   v1.22.8
octavia-k8s-2   Ready    <none>                 18h   v1.22.8

I configured kubernetes for occm (v1.22.1) and I applied several yaml files in ~/cloud-provider-openstack/manifests/controller-manager. (cloud-controller-manager-roles.yaml, openstack-cloud-controller-manager-ds.yaml, cloud-controller-manager-role-bindings.yaml kubeadm.conf, openstack-cloud-controller-manager-pod.yaml)

Finally, I applied ~/cloud-provider-openstack/examples/loadbalancers/external-http-nginx.yaml. And I got external ip for external-http-nginx-service successfully.

# kubectl get svc
NAME                          TYPE           CLUSTER-IP    EXTERNAL-IP       PORT(S)        AGE
external-http-nginx-service   LoadBalancer   10.233.45.4   1.2.1.242(edited)   80:31206/TCP   55s
kubernetes                    ClusterIP      10.233.0.1    <none>            443/TCP        19h

In kubernetes nodes, I can access the external-http-nginx-service by 1.2.1.242 well.

root@octavia-k8s-1:~# curl 1.2.1.242
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
...

BUT, in other nodes(VM) in same subnet or physical machine(openstack-server), I cannot get the result by 1.2.1.242.

ubuntu@openstack-server:~$ curl 1.2.1.242
curl: (52) Empty reply from server

I guess maybe I'm missing some configuration.

What you expected to happen: In other nodes(VM) in same subnet or openstack-server

How to reproduce it:

Anything else we need to know?: I installed openstack by kolla-ansible for xena.

kubernetes master's ip a result is as below:

root@octavia-k8s-1:~# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: ens3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc fq_codel state UP group default qlen 1000
    link/ether fa:16:3e:92:a7:e6 brd ff:ff:ff:ff:ff:ff
    inet 10.64.0.7/24 brd 10.64.0.255 scope global dynamic ens3
       valid_lft 15733sec preferred_lft 15733sec
    inet6 fe80::f816:3eff:fe92:a7e6/64 scope link
       valid_lft forever preferred_lft forever
3: kube-ipvs0: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default
    link/ether d6:c8:83:d0:eb:d7 brd ff:ff:ff:ff:ff:ff
    inet 10.233.0.1/32 scope global kube-ipvs0
       valid_lft forever preferred_lft forever
    inet 10.233.0.3/32 scope global kube-ipvs0
       valid_lft forever preferred_lft forever
    inet 10.233.45.4/32 scope global kube-ipvs0
       valid_lft forever preferred_lft forever
    inet 1.2.1.242/32 scope global kube-ipvs0
       valid_lft forever preferred_lft forever
    inet6 fe80::d4c8:83ff:fed0:ebd7/64 scope link
       valid_lft forever preferred_lft forever
6: tunl0@NONE: <NOARP,UP,LOWER_UP> mtu 1430 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ipip 0.0.0.0 brd 0.0.0.0
    inet 10.233.79.0/32 scope global tunl0
       valid_lft forever preferred_lft forever
7: calida9c852d892@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1430 qdisc noqueue state UP group default
    link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet6 fe80::ecee:eeff:feee:eeee/64 scope link
       valid_lft forever preferred_lft forever
8: cali15fc62ac7b1@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1430 qdisc noqueue state UP group default
    link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netnsid 1
    inet6 fe80::ecee:eeff:feee:eeee/64 scope link
       valid_lft forever preferred_lft forever
9: nodelocaldns: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default
    link/ether be:93:fd:e0:a3:a8 brd ff:ff:ff:ff:ff:ff
    inet 169.254.25.10/32 scope global nodelocaldns
       valid_lft forever preferred_lft forever
11: cali7b3550237a5@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1430 qdisc noqueue state UP group default
    link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netnsid 2
    inet6 fe80::ecee:eeff:feee:eeee/64 scope link
       valid_lft forever preferred_lft forever

openstack-server's ip route is as below:

ubuntu@openstack-server:~$ ip route
default via 1.2.1.1 dev eno3 onlink
10.64.0.0/24 via 1.2.1.244 dev eno3
10.254.0.0/24 via 1.2.1.244 dev eno3
1.2.1.0/24 dev eno3 proto kernel scope link src 1.2.1.235
1.2.1.240/28 via 1.2.1.244 dev eno3

Environment:

jichenjc commented 2 years ago

This is weird.. LB is a VM with pre-installed haproxy by default and the ip 1.2.1.214 should be the ip of the LB I knew we had a sec group fix recently https://github.com/kubernetes/cloud-provider-openstack/issues/1830 but not sure it's related ,as you can curl from one machine but not the other seems related to firewall..

are you able to check any logs in OCCM logs and see any thing suspcious?

sykim-etri commented 2 years ago

@jichenjc Thanks for your comment.

This is full OCCM log. I use latest occm version with log level 4.

occm-latest.log

In this log, I think this warning is suspicious. But I don't know what it means.--;

W0520 20:30:23.777141       1 openstack.go:325] Failed to create an OpenStack Secret client: unable to initialize keymanager client for region RegionOne: No suitable endpoint could be found in the service catalog.
root@octavia-k8s-1:~/cloud-provider-openstack/examples/loadbalancers# k get svc
NAME                          TYPE           CLUSTER-IP      EXTERNAL-IP       PORT(S)        AGE
external-http-nginx-service   LoadBalancer   10.233.10.134   1.2.1.249   80:31217/TCP   7m28s
kubernetes                    ClusterIP      10.233.0.1      <none>            443/TCP        32h
sykim-etri commented 2 years ago

@jichenjc This is more detailed log(level 7).

occm-log.txt

jichenjc commented 2 years ago

Failed to create an OpenStack Secret client: unable to initialize keymanager client for region RegionOne: No suitable endpoint could be found in the service catalog.

this is ok, as it only tell us it's not able to find barican service in your catalog which is optional .

jichenjc commented 2 years ago

the log provided seems truncated ,and nothing special until I0522 13:46:34.568918 which is last log I can see..

have you tcpdump on the VM to anything wrong there? beyond my expertise now ... not sure someone else has background?

sykim-etri commented 2 years ago

@jichenjc Thanks for your comment.

Do you know any openstack(with octavia) installation documents as line by line? I'll try a clean install.

I guess my network configuration may be wrong.

jichenjc commented 2 years ago

I think https://docs.openstack.org/devstack/latest/guides/devstack-with-lbaas-v2.html might be the easiest way to go..

k8s-triage-robot commented 2 years ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot commented 2 years ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot commented 2 years ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-ci-robot commented 2 years ago

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to [this](https://github.com/kubernetes/cloud-provider-openstack/issues/1877#issuecomment-1285609369): >The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. > >This bot triages issues according to the following rules: >- After 90d of inactivity, `lifecycle/stale` is applied >- After 30d of inactivity since `lifecycle/stale` was applied, `lifecycle/rotten` is applied >- After 30d of inactivity since `lifecycle/rotten` was applied, the issue is closed > >You can: >- Reopen this issue with `/reopen` >- Mark this issue as fresh with `/remove-lifecycle rotten` >- Offer to help out with [Issue Triage][1] > >Please send feedback to sig-contributor-experience at [kubernetes/community](https://github.com/kubernetes/community). > >/close not-planned > >[1]: https://www.kubernetes.dev/docs/guide/issue-triage/ Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.