submariner-io / submariner

Networking component for interconnecting Pods and Services across Kubernetes clusters.
https://submariner.io
Apache License 2.0
2.43k stars 193 forks source link

OVN support doesn't work for ocp 4.8.0-0.nightly-2021-04-20-231007 #1276

Closed mangelajo closed 3 years ago

mangelajo commented 3 years ago

Depends on #1282 What happened:

route-agent on the gateway nodes fails to start:

[majopela@bluehat subm-demo]$ kubectl logs -p submariner-routeagent-jcs6x -n submariner-operator
+ trap 'exit 1' SIGTERM SIGINT
+ SUBMARINER_VERBOSITY=2
+ '[' false == true ']'
+ DEBUG=-v=2
+ exec submariner-route-agent -v=2 -alsologtostderr
I0421 11:49:49.379949       1 main.go:49] Starting submariner-route-agent using the event framework
W0421 11:49:49.380569       1 client_config.go:543] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
I0421 11:49:49.382587       1 cni_iface.go:71] Interface "lo" has "127.0.0.1" address
I0421 11:49:49.382746       1 cni_iface.go:71] Interface "br-ex" has "10.0.52.206" address
I0421 11:49:49.382883       1 cni_iface.go:71] Interface "ovn-k8s-mp0" has "10.129.2.2" address
I0421 11:49:49.382899       1 cni_iface.go:76] Found CNI Interface "ovn-k8s-mp0" that has IP "10.129.2.2" from ClusterCIDR "10.128.0.0/14"
I0421 11:49:49.405019       1 cni_iface.go:113] Successfully annotated node "ip-10-0-52-206.us-east-2.compute.internal" with cniIfaceIP "10.129.2.2"
I0421 11:49:49.405041       1 registry.go:65] Event handler "logger" added to registry "routeagent_driver".
I0421 11:49:49.405049       1 registry.go:67] Event handler "kubeproxy-iptables-handler" ignored for registry "routeagent_driver".
I0421 11:49:49.423337       1 iptables.go:29] Install/ensure SUBMARINER-POSTROUTING chain exists
I0421 11:49:49.426033       1 iptables.go:35] Insert SUBMARINER-POSTROUTING rule that has rules for inter-cluster traffic
I0421 11:49:49.427869       1 util.go:207] In nat table, iptables rule "-j SUBMARINER-POSTROUTING", exists at index 1.
I0421 11:49:49.427882       1 util.go:228] In nat table, iptables rule "-j SUBMARINER-POSTROUTING", already exists.
I0421 11:49:49.427888       1 registry.go:65] Event handler "ovn-hostroutes-handler" added to registry "routeagent_driver".
W0421 11:49:49.427954       1 client_config.go:543] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
I0421 11:50:05.034811       1 controller.go:139] Starting the Event controller...
I0421 11:50:05.135406       1 handler.go:69] A new Endpoint for remote cluster "majopela-b--1618995408" has been created: v1.EndpointSpec{ClusterID:"majopela-b--1618995408", CableName:"submariner-cable-majopela-b--1618995408-10-0-58-214", HealthCheckIP:"10.133.2.2", Hostname:"ip-10-0-58-214", Subnets:[]string{"172.31.0.0/16", "10.132.0.0/14"}, PrivateIP:"10.0.58.214", PublicIP:"3.141.11.148", NATEnabled:true, Backend:"libreswan", BackendConfig:map[string]string{"natt-discovery-port":"4490", "preferred-server":"false", "udp-port":"4501"}}
E0421 11:50:05.135629       1 endpoint_created.go:37] Error handling created endpoint: RemoteEndpointCreated failed: "ovn-hostroutes-handler" returned error: updateHostNetworkDataplane returned error: getNextHopOnK8sMgmtIntf returned error: missing localEndpoint info
I0421 11:50:05.135803       1 handler.go:54] A new Endpoint for the local cluster has been created: v1.EndpointSpec{ClusterID:"majopela-a--1618995408", CableName:"submariner-cable-majopela-a--1618995408-10-0-52-206", HealthCheckIP:"10.129.2.2", Hostname:"ip-10-0-52-206", Subnets:[]string{"172.30.0.0/16", "10.128.0.0/14"}, PrivateIP:"10.0.52.206", PublicIP:"3.143.116.42", NATEnabled:true, Backend:"libreswan", BackendConfig:map[string]string{"natt-discovery-port":"4490", "preferred-server":"false", "udp-port":"4501"}}
I0421 11:50:05.136015       1 handler.go:49] The current node has become a Gateway
F0421 11:50:05.136324       1 gateway_dataplane.go:239] error looking for "ovn-k8s-gw0" interface, trying to detect the submariner_router upstream IP: route ip+net: no such network interface

What you expected to happen:

route-agent should work, connectivity should work.

How to reproduce it (as minimally and precisely as possible):

Install submariner on top of OCP 4.8

Anything else we need to know?:

The local_node_switch is not created in OCP4.8, this is where the ovn-k8s-gw0 is created.

We could create a local_node_switch for submariner, and attach subm-k8s-gw0 in each node.

Environment:

submariner-20210421114355.tar.gz

mangelajo commented 3 years ago

Ok, so when ovn is not in shared gateway mode ovn-k8s-gw0 and the node_local_switch don't exist.

I have tested by creating those resources manually on the nodes, and it's back to working, to avoid collisions and issues with the different implementations the proposed solution would be to use an independent switch & localnet switch on the nodes, in a way that we become much more independent of the ovn-kubernetes mode:

# see go-controller/pkg/node/gateway_shared_intf_linux.go

ovs-vsctl --may-exist add-br br-submariner
ovs-vsctl br-set-external-id br-submariner bridge-id br-submariner # probably not necessary
ovs-vsctl --if-exists get Open_vSwitch . external_ids:ovn-bridge-mappings
ovs-vsctl set open . external-ids:ovn-bridge-mappings=physnet:br-ex,submariner:br-submariner
ovs-vsctl --may-exist add-port br-submariner ovn-k8s-sub0 -- set interface ovn-k8s-sub0 type=internal mtu_request=$MTU mac="0a\\:58\\:11\\:22\\:33\\:44"
ip l set ovn-k8s-sub0 up
ip addr add dev ovn-k8s-sub0 169.254.254.9/29 # the submariner upstream leg moves to have 169.254.254.8/29

Then on the OVN side:

ovn-nbctl ls-add submariner_gateway
ovn-nbctl lsp-add submariner_gateway submariner-localnet
ovn-nbctl lsp-set-type submariner-localnet localnet
ovn-nbctl lsp-set-addresses submariner-localnet unknown
ovn-nbctl lsp-set-options submariner-localnet network_name=submariner

The submariner_router upstream leg goes into "submariner_gateway" switch.

github-actions[bot] commented 3 years ago

:tada: Great news! Looks like all the dependencies have been resolved:

:bulb: To add or remove a dependency please update this issue/PR description.

Brought to you by Dependent Issues (:robot: ). Happy coding!