submariner-io / submariner

Networking component for interconnecting Pods and Services across Kubernetes clusters.
https://submariner.io
Apache License 2.0
2.42k stars 190 forks source link

failed to run submariner on ROKS #968

Closed zhiweiyin318 closed 3 years ago

zhiweiyin318 commented 3 years ago

What happened: the submariner-routeagent pods are CrashLoopBackOff when I run submariner on ROKS (Openshift on IBM Cloud).

What you expected to happen: Please help to check

  1. if the errors are caused by the UDP ports 4500,5800,5000 that has not added into firewall.
  2. what others should be configured for ROKS.

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Environment:

$ oc version
Client Version: 4.4.4
Server Version: 4.4.29
Kubernetes Version: v1.17.1+45f8ddb
[centos@yzw ~]$ oc get nodes -o wide
NAME            STATUS   ROLES           AGE   VERSION           INTERNAL-IP     EXTERNAL-IP     OS-IMAGE   KERNEL-VERSION               CONTAINER-RUNTIME
10.188.58.211   Ready    master,worker   72m   v1.17.1+99078c8   10.188.58.211   169.60.100.23   Red Hat    3.10.0-1160.6.1.el7.x86_64   cri-o://1.17.5-11.rhaos4.4.git7f979af.el7
10.211.78.120   Ready    master,worker   72m   v1.17.1+99078c8   10.211.78.120   169.47.173.18   Red Hat    3.10.0-1160.6.1.el7.x86_64   cri-o://1.17.5-11.rhaos4.4.git7f979af.el7
10.39.7.228     Ready    master,worker   72m   v1.17.1+99078c8   10.39.7.228     169.62.37.213   Red Hat    3.10.0-1160.6.1.el7.x86_64   cri-o://1.17.5-11.rhaos4.4.git7f979af.el7
[centos@yzw ~]$ oc get pods -n submariner-operator
NAME                                             READY   STATUS             RESTARTS   AGE
submariner-gateway-jllfq                         1/1     Running            0          14m
submariner-lighthouse-agent-854869797-gd9sd      1/1     Running            6          14m
submariner-lighthouse-coredns-599ccb9b5b-47jg4   1/1     Running            0          14m
submariner-lighthouse-coredns-599ccb9b5b-l4w6l   1/1     Running            0          14m
submariner-operator-59b8555b4d-dr8lf             1/1     Running            0          16m
submariner-routeagent-5c66t                      0/1     CrashLoopBackOff   7          14m
submariner-routeagent-lbn9l                      0/1     CrashLoopBackOff   7          14m
submariner-routeagent-whkth                      0/1     CrashLoopBackOff   7          14m
[centos@yzw ~]$ oc logs -n submariner-operator submariner-routeagent-5c66t
+ trap 'exit 1' SIGTERM SIGINT
+ SUBMARINER_VERBOSITY=2
+ '[' false == true ']'
+ DEBUG=-v=2
+ for f in iptables-save iptables
++ find_iptables_on_host iptables-save
++ chroot /host test -x /usr/sbin/iptables-save
++ echo /usr/sbin
++ return
+ location=/usr/sbin
+ '[' /usr/sbin '!=' unknown ']'
+ echo 'iptables-save is present on the host at /usr/sbin/iptables-save'
iptables-save is present on the host at /usr/sbin/iptables-save
+ sed 's!@@PATH@@!/usr/sbin!' /usr/sbin/iptables-wrapper.in
+ for f in iptables-save iptables
++ find_iptables_on_host iptables
++ chroot /host test -x /usr/sbin/iptables
++ echo /usr/sbin
++ return
+ location=/usr/sbin
+ '[' /usr/sbin '!=' unknown ']'
iptables is present on the host at /usr/sbin/iptables
+ echo 'iptables is present on the host at /usr/sbin/iptables'
+ sed 's!@@PATH@@!/usr/sbin!' /usr/sbin/iptables-wrapper.in
+ exec submariner-route-agent -v=2 -alsologtostderr
I1125 14:05:28.928518       1 main.go:47] Starting submariner-route-agent
W1125 14:05:28.928717       1 client_config.go:543] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
I1125 14:05:28.936363       1 route.go:195] Starting Route Controller. ClusterID: roks, localClusterCIDR: [172.30.0.0/16], localServiceCIDR: [172.21.0.0/16]
I1125 14:05:28.936427       1 route.go:199] Waiting for Endpoint informer caches to sync.
I1125 14:05:28.956448       1 route.go:736] Enqueueing endpoint for route controller {"kind":"Endpoint","apiVersion":"submariner.io/v1","metadata":{"name":"roks-submariner-cable-roks-169-60-100-23","namespace":"submariner-operator","selfLink":"/apis/submariner.io/v1/namespaces/submariner-operator/endpoints/roks-submariner-cable-roks-169-60-100-23","uid":"ee58e5aa-cd2d-4b8b-aa73-5e7803647f69","resourceVersion":"61833","generation":1,"creationTimestamp":"2020-11-25T14:01:10Z"},"spec":{"cluster_id":"roks","cable_name":"submariner-cable-roks-169-60-100-23","hostname":"kube-buv54iew0b0jvg5gvfqg-yzwtest-default-000002a6.iks.ibm","subnets":["172.21.0.0/16","172.30.0.0/16"],"private_ip":"169.60.100.23","public_ip":"169.60.100.23","nat_enabled":true,"backend":"strongswan"}}
I1125 14:05:28.973619       1 route.go:751] Enqueueing sm-route-agent-pod event, ip: 10.39.7.228
I1125 14:05:28.973655       1 route.go:751] Enqueueing sm-route-agent-pod event, ip: 10.211.78.120
I1125 14:05:28.973678       1 route.go:751] Enqueueing sm-route-agent-pod event, ip: 10.188.58.211
I1125 14:05:29.037653       1 driver.go:37] Interface "lo" has "127.0.0.1" address
I1125 14:05:29.037696       1 driver.go:37] Interface "lo" has "172.20.0.1" address
I1125 14:05:29.037923       1 driver.go:37] Interface "eth0" has "10.188.58.211" address
I1125 14:05:29.038079       1 driver.go:37] Interface "eth1" has "169.60.100.23" address
I1125 14:05:29.038099       1 driver.go:37] Interface "eth1" has "169.60.85.246" address
I1125 14:05:29.038108       1 driver.go:37] Interface "eth1" has "169.60.85.245" address
I1125 14:05:29.038245       1 driver.go:37] Interface "vethlocal" has "127.0.0.10" address
I1125 14:05:29.038419       1 driver.go:37] Interface "tunl0" has "172.30.45.0" address
I1125 14:05:29.038439       1 driver.go:42] Found CNI Interface "tunl0" that has IP "172.30.45.0" from ClusterCIDR "172.30.0.0/16"
I1125 14:05:29.038610       1 driver.go:60] Successfully configured rp_filter to loose mode(2) on cniInterface "tunl0"
I1125 14:05:29.046457       1 iptables.go:22] Install/ensure SUBMARINER-POSTROUTING chain exists
I1125 14:05:29.062345       1 iptables.go:28] Insert SUBMARINER-POSTROUTING rule that has rules for inter-cluster traffic
I1125 14:05:29.071140       1 util.go:221] In nat table, iptables rule "-j SUBMARINER-POSTROUTING", exists at index 2.
I1125 14:05:29.088072       1 iptables.go:34] Install/ensure SUBMARINER-INPUT chain exists
I1125 14:05:29.113098       1 iptables.go:45] Allow VxLAN incoming traffic in SUBMARINER-INPUT Chain
I1125 14:05:29.121920       1 iptables.go:53] Insert rule to allow traffic over vx-submariner interface in FORWARDing Chain
I1125 14:05:29.130726       1 util.go:221] In filter table, iptables rule "-o vx-submariner -j ACCEPT", exists at index 2.
I1125 14:05:29.151189       1 iptables.go:65] Installing rule for host network to remote cluster communication: -s 240.0.0.0/8 -o vx-submariner -j SNAT --to 172.30.45.0
I1125 14:05:29.181045       1 route.go:257] In roks, podIP of submariner-route-agent[0] is 10.188.58.211
I1125 14:05:29.181146       1 route.go:257] In roks, podIP of submariner-route-agent[1] is 10.211.78.120
I1125 14:05:29.181198       1 route.go:257] In roks, podIP of submariner-route-agent[2] is 10.39.7.228
I1125 14:05:29.181245       1 route.go:271] Hostname is "kube-buv54iew0b0jvg5gvfqg-yzwtest-default-000002a6.iks.ibm" and routeAgentNodeName is ""
F1125 14:05:29.181298       1 main.go:98] Error running route controller: could not get the nodeName on host "kube-buv54iew0b0jvg5gvfqg-yzwtest-default-000002a6.iks.ibm"
zhiweiyin318 commented 3 years ago

change to devel, and the same errors in submariner-routeagent pods. image info:

    image: quay.io/submariner/submariner-route-agent:devel
    imageID: quay.io/submariner/submariner-route-agent@sha256:fe8556c3c288557ffbc8a191ed23fc362550a15747a54c6a62739ed61e82491f

logs:

I1126 00:35:29.091699       1 main.go:47] Starting submariner-route-agent
W1126 00:35:29.092122       1 client_config.go:543] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
I1126 00:35:29.103507       1 route.go:192] Starting Route Controller. ClusterID: roks, localClusterCIDR: [172.30.0.0/16], localServiceCIDR: [172.21.0.0/16]
I1126 00:35:29.103659       1 route.go:196] Waiting for Endpoint informer caches to sync.
I1126 00:35:29.142451       1 route.go:656] Enqueueing sm-route-agent-pod event, ip: 10.188.58.211
I1126 00:35:29.142501       1 route.go:656] Enqueueing sm-route-agent-pod event, ip: 10.39.7.228
I1126 00:35:29.142546       1 route.go:656] Enqueueing sm-route-agent-pod event, ip: 10.211.78.120
I1126 00:35:29.205551       1 driver.go:37] Interface "lo" has "127.0.0.1" address
I1126 00:35:29.205713       1 driver.go:37] Interface "lo" has "172.20.0.1" address
I1126 00:35:29.205995       1 driver.go:37] Interface "eth0" has "10.188.58.211" address
I1126 00:35:29.206199       1 driver.go:37] Interface "eth1" has "169.60.100.23" address
I1126 00:35:29.206285       1 driver.go:37] Interface "eth1" has "169.60.85.246" address
I1126 00:35:29.206334       1 driver.go:37] Interface "eth1" has "169.60.85.245" address
I1126 00:35:29.206613       1 driver.go:37] Interface "vethlocal" has "127.0.0.10" address
I1126 00:35:29.206857       1 driver.go:37] Interface "tunl0" has "172.30.45.0" address
I1126 00:35:29.206955       1 driver.go:42] Found CNI Interface "tunl0" that has IP "172.30.45.0" from ClusterCIDR "172.30.0.0/16"
I1126 00:35:29.207161       1 driver.go:60] Successfully configured rp_filter to loose mode(2) on cniInterface "tunl0"
I1126 00:35:29.215226       1 iptables.go:22] Install/ensure SUBMARINER-POSTROUTING chain exists
I1126 00:35:29.232408       1 iptables.go:28] Insert SUBMARINER-POSTROUTING rule that has rules for inter-cluster traffic
I1126 00:35:29.243949       1 util.go:225] In nat table, iptables rule "-j SUBMARINER-POSTROUTING", exists at index 2.
I1126 00:35:29.267722       1 iptables.go:34] Install/ensure SUBMARINER-INPUT chain exists
I1126 00:35:29.293085       1 iptables.go:45] Allow VxLAN incoming traffic in SUBMARINER-INPUT Chain
I1126 00:35:29.305155       1 iptables.go:53] Insert rule to allow traffic over vx-submariner interface in FORWARDing Chain
I1126 00:35:29.315651       1 util.go:225] In filter table, iptables rule "-o vx-submariner -j ACCEPT", exists at index 2.
I1126 00:35:29.336858       1 iptables.go:65] Installing rule for host network to remote cluster communication: -s 240.0.0.0/8 -o vx-submariner -j SNAT --to 172.30.45.0
I1126 00:35:29.368211       1 route.go:254] In roks, podIP of submariner-route-agent[0] is 10.211.78.120
I1126 00:35:29.368250       1 route.go:254] In roks, podIP of submariner-route-agent[1] is 10.188.58.211
I1126 00:35:29.368261       1 route.go:254] In roks, podIP of submariner-route-agent[2] is 10.39.7.228
I1126 00:35:29.368270       1 route.go:268] Hostname is "kube-buv54iew0b0jvg5gvfqg-yzwtest-default-000002a6.iks.ibm" and routeAgentNodeName is ""
F1126 00:35:29.368286       1 main.go:98] Error running route controller: could not get the nodeName on host "kube-buv54iew0b0jvg5gvfqg-yzwtest-default-000002a6.iks.ibm"
sridhargaddam commented 3 years ago

@zhiweiyin318 thanks for reporting the issue. As discussed over slack, lets retry this with the new route-agent-driver which uses a different mechanism to annotate a node.

zhiweiyin318 commented 3 years ago

it can works on ROKS using submariner devel. thanks for supports from team.

sridhargaddam commented 3 years ago

Just for notes, the problem with route-agent mentioned in this issue is resolved via the following PR - https://github.com/submariner-io/submariner/pull/946/commits/9a384f7f80fed26ab855260b710acf72e3ba87b6