kubeovn / kube-ovn

A Bridge between SDN and Cloud Native (Project under CNCF)
https://kubeovn.github.io/docs/stable/en/
Apache License 2.0
2k stars 451 forks source link

[BUG] unable to reach kubernetes default API endpoint from underlay pods #4786

Open patriziobassi opened 2 days ago

patriziobassi commented 2 days ago

Kube-OVN Version

v1.12.8

Kubernetes Version

v1.30

Operation-system/Kernel Version

"Ubuntu 22.04.3 LTS" kernel 5.15.0-83-generic

Description

Underlay pods cannot reach the internal kubernetes clusterip using internal geneve networks.

They try to use the external network thus transforming a east-west encapsulated traffic into a north-south communication passing through a firewall.

Our setup: K8s cluster with a underlay subnet, in default ovn-cluster VPC with U2O enabled. when trying to reach the internal k8s endpoint

# curl -v https://kubernetes.default.svc.cluster.local

the traffic is observed going out the physical (underlay) interface instead of getting re-routed into the ovn0 logical interface in order to reach the API server running on port TCP 6433 in the controlplane nodes (master nodes).

unfortunately these nodes are in the u2o_exclude_ip list so we hit the rule with prio 31000 and traffic re-routes to the physical gateway.

We tried to edit the default vpc ovn-cluster adding in the spec:

policyRoutes:

but traffic gets lost, even if the routing policy is applied.

at the moment the ko diagnose command gives us:

Routing Policies 31000 ip4.dst == $pod.subnet.u2o_exclude_ip.ip4 && ip4.src == 10.53.128.0/21 reroute 10.53.128.1 31000 ip4.dst == 10.52.0.0/17 allow 31000 ip4.dst == 100.64.0.0/16 allow 30000 ip4.dst == 10.246.17.118 reroute 100.64.0.7 30000 ip4.dst == 10.246.17.170 reroute 100.64.0.5 30000 ip4.dst == 10.246.17.181 reroute 100.64.0.4 30000 ip4.dst == 10.246.17.205 reroute 100.64.0.2 30000 ip4.dst == 10.246.17.231 reroute 100.64.0.3 30000 ip4.dst == 10.246.17.239 reroute 100.64.0.6 29400 ip4.dst == 10.53.128.0/21 allow 29000 ip4.src == $ovn.default.juju.f667b5.k8s.1_ip4 reroute 100.64.0.7 29000 ip4.src == $ovn.default.juju.f667b5.k8s.2_ip4 reroute 100.64.0.2 29000 ip4.src == $ovn.default.juju.f667b5.k8s.3_ip4 reroute 100.64.0.3 29000 ip4.src == $ovn.default.juju.f667b5.k8s.4_ip4 reroute 100.64.0.5 29000 ip4.src == $ovn.default.juju.f667b5.k8s.5_ip4 reroute 100.64.0.6 29000 ip4.src == $ovn.default.juju.f667b5.k8s.6_ip4 reroute 100.64.0.4 29000 ip4.src == 10.53.128.0/21 reroute 10.53.128.1

so why the underlay pod traffic cannot stay internal to the cluster? the policy is there a way to force it? It looks like the policy2 is matched https://github.com/kubeovn/kube-ovn/blob/master/pkg/controller/subnet.go#L2753

Thank you

Steps To Reproduce

.

Current Behavior

traffic going to external gateway (north-south)

Expected Behavior

traffic is kept internally to the k8s cluster (east-west)

dosubot[bot] commented 2 days ago

I found a similar closed issue titled "Interconnection between overlay and underlay subnet" that was resolved by a pull request implementing the u2o feature [1]. This might be relevant to your problem with underlay pods not reaching the Kubernetes API endpoint internally.

To address your issue, ensure that the U2O (Underlay to Overlay) interconnection is correctly configured. You mentioned that the nodes are in the u2o_exclude_ip list, which might be affecting the routing. The u2o_exclude_ip list is likely used to exclude certain IPs from U2O routing, which could be why your traffic is being routed externally.

You attempted to modify the routing policy by adding a new rule with a higher priority, but it seems the traffic is still not being routed as expected. Double-check the configuration of your routing policies and ensure that the new rule is correctly applied and prioritized over existing rules.

Additionally, verify that the U2O interconnection is enabled for the relevant subnets and that the U2OInterconnectionIP is correctly set. If the interconnection is not functioning as expected, it might be necessary to review the configuration and ensure that all related settings are correctly applied.

To continue talking to Dosu, mention @dosu.

patriziobassi commented 1 day ago

i think @changluyi wrote most of that code so he may have any clue about it.