kubeovn / kube-ovn

A Bridge between SDN and Cloud Native (Project under CNCF)
https://kubeovn.github.io/docs/stable/en/
Apache License 2.0
1.93k stars 438 forks source link

How to configure EIP / SNAT with the following Node NIC configuration? #3812

Open Smithx10 opened 6 months ago

Smithx10 commented 6 months ago

Good day,

My Compute Nodes in the DC have the following network configuration:

We have 2 bonds "admin" and "external".
Admin has a non routable network address of 172.16.0.1 untagged (vlan 0) . External is configured as a carrier interface with a Routable VLAN interface under it 10.91.230.10/24 VLAN 2073.

All of our control plane nodes and compute nodes have very similar configurations if not exactly the same.

What are our options in configuring EIP in this scenario?

admin and external members.

  4 enp175s0f0     ether    enslaved    unmanaged  admin
  5 enp175s0f1     ether    enslaved    unmanaged external
  6 ens2f0         ether    enslaved    unmanaged admin 
  7 ens2f1         ether    enslaved    unmanaged external

All Interfaces

root@bdfv2-us-east-1-headnode-01:~# networkctl
IDX LINK           TYPE     OPERATIONAL SETUP
  1 lo             loopback carrier     unmanaged
  2 eno1           ether    off         unmanaged
  3 eno2           ether    off         unmanaged
  4 enp175s0f0     ether    enslaved    unmanaged
  5 enp175s0f1     ether    enslaved    unmanaged
  6 ens2f0         ether    enslaved    unmanaged
  7 ens2f1         ether    enslaved    unmanaged
  8 admin          bond     routable    configured
  9 external       bond     carrier     configured
 10 external.2073  vlan     routable    configured
 11 ovs-system     ether    off         unmanaged
 12 br-int         ether    off         unmanaged
 13 mirror0        ether    degraded    unmanaged
 14 ovn0           ether    routable    unmanaged
 16 9e7b62b19fad_h ether    enslaved    unmanaged
 18 4ba2e6f4e423_h ether    enslaved    unmanaged
 20 25cd24b35f4d_h ether    enslaved    unmanaged
 22 5cd92c4839f2_h ether    enslaved    unmanaged
 24 9c870b2299d4_h ether    enslaved    unmanaged
 26 189f19cd74b8_h ether    enslaved    unmanaged
 30 706691b3a900_h ether    enslaved    unmanaged
 32 66113afcc98b_h ether    enslaved    unmanaged
 34 e03ae08375e2_h ether    enslaved    unmanaged
 36 6ceb42989f1e_h ether    enslaved    unmanaged
 38 b0545599b48b_h ether    enslaved    unmanaged
 40 4292bd6ad8be_h ether    enslaved    unmanaged
 42 8326233203b6_h ether    enslaved    unmanaged
 44 ce6f1b99d075_h ether    enslaved    unmanaged

28 links listed.
bobz965 commented 6 months ago

use the external (bond) nic to construct the br-external ovs bridge

Smithx10 commented 6 months ago

@bobz965 ,

Since the external bond is a carrier grade interface, in order for my EIP to work I'd need to be able to configure a vlan id for it to use. We only have /24 available per routable network / vlan.

I saw the configuration option, but this looks like this is a global configuration. 1 Per Cluster? --external-gateway-vlanid: Physical network Vlan Tag number, default is 0, i.e. no Vlan is used.

But looking through the code I don't see that option be utilized anymore. Looks like only these options exist.

        argExternalGatewayConfigNS   = pflag.String("external-gateway-config-ns", "kube-system", "The namespace of configmap external-gateway-config, default: kube-system")
        argExternalGatewaySwitch     = pflag.String("external-gateway-switch", "external", "The name of the external gateway switch which is a ovs bridge to provide external network, default: external")

If I wanted to be able to specify which EIP Network VLAN I'd like for folks to use how do I do this?

I have already configured a provider-network, vlan, and underlay subnet for putting pods directly onto underlay fabrics, but would like to explore the use case where someone only needs EIP / SNAT.

bobz965 commented 6 months ago

@bobz965 ,

Since the external bond is a carrier grade interface, in order for my EIP to work I'd need to be able to configure a vlan id for it to use. We only have /24 available per routable network / vlan.

I saw the configuration option, but this looks like this is a global configuration. 1 Per Cluster? --external-gateway-vlanid: Physical network Vlan Tag number, default is 0, i.e. no Vlan is used.

But looking through the code I don't see that option be utilized anymore. Looks like only these options exist.

      argExternalGatewayConfigNS   = pflag.String("external-gateway-config-ns", "kube-system", "The namespace of configmap external-gateway-config, default: kube-system")
      argExternalGatewaySwitch     = pflag.String("external-gateway-switch", "external", "The name of the external gateway switch which is a ovs bridge to provide external network, default: external")

If I wanted to be able to specify which EIP Network VLAN I'd like for folks to use how do I do this?

I have already configured a provider-network, vlan, and underlay subnet for putting pods directly onto underlay fabrics, but would like to explore the use case where someone only needs EIP / SNAT.

image

it looks like it is still in using.

Smithx10 commented 5 months ago

Thank you for clarifying where that is in use.

So I think I'm a bit perplexed or confused about what Features to use to solve which problems. I hope you can be so kind to please clarify.

What I am trying to provide to my users: The ability for them to deploy a pod and use EIP / FIP / SNAT. We currently only have /24s available so it wouldn't be unimaginable for every user to get their own VPC / VPC Gateway / and a designated /24 to use for EIP/ FIP / SNAT. I am trying to discover which feature set or approach I should use when attempting to provide this behavior using ovn-external-gw-config and external underlay subnets.

Assumption 1) The "Default VPC Enable EIP_SNAT" is used to configure 1 External CIDR that the Default VPC Subnet can use. This configuration is defined via "ovn-external-gw-config". In the --external-gateway-switch=external204 example we are using the provider-network bridge name as our gateway interface. Pods can consume this feature via the ovn.kubernetes.io/snat: and ovn.kubernetes.io/eip annotations when using the Default VPC Subnet.

Assumption 2) There is another way to provide EIP/FIP/SNAT via provisioning an external underlay that users can use via the OvnEip and OvnFip CRs.

I've read over this documentation but am still unsure of the usage scenarios / or available options. After getting some of my assumptions clarified I'd definitely be up for putting in some PRs to help improve the documentation for new users that lack some context around deploying. https://kubeovn.github.io/docs/v1.13.x/en/advance/ovn-eip-fip-snat/

Thanks again for a great project.

bobz965 commented 5 months ago

Assumption 1): only subnet in default vpc can use the eip snat in the way of annotation. Assumption 2): all the subnet in any vpc can use the eip snat dnat in the way of CRD.

but for both of the two cases, you need also set the ovn-external-gw-config and the --external-gateway-switch=external204 first to init the gw nodes.

Smithx10 commented 5 months ago

With that understanding, we will be limited to 1 cidr on 1 vlan for the entire cluster for EIP/FIP/SNAT?

bobz965 commented 5 months ago

With that understanding, we will be limited to 1 cidr on 1 vlan for the entire cluster for EIP/FIP/SNAT?

there is no this limit in master branch。 you can use more than 1 vlan external subnet to each vpc

Smithx10 commented 5 months ago

Ok: I see this in the v.1.13 documentation: https://kubeovn.github.io/docs/v1.13.x/en/advance/ovn-eip-fip-snat/#142-custom-vpc-configuration

I did notice the mention of v1.12-mc vs v1.12 vs master, does v1.12-mc have the "more than 1 external subnet per vpc" feature or is that only on master?

I assume Helm is not using the 1.12-mc branch, and doesn't have support for installing off 1.13.0?

From the Scripts I see: wget https://raw.githubusercontent.com/kubeovn/kube-ovn/master/dist/images/install.sh VERSION=v1.13.0

wget https://raw.githubusercontent.com/kubeovn/kube-ovn/release-1.12-mc/dist/images/install.sh VERSION=v1.12.4-mc

Helm only shows: root@bdfv2-us-east-1-headnode-01:~# helm search repo kubeovn --versions --devel NAME CHART VERSION APP VERSION DESCRIPTION kubeovn/kube-ovn v1.12.9 1.12.9 Helm chart for Kube-OVN kubeovn/kube-ovn v1.12.8 1.12.8 Helm chart for Kube-OVN kubeovn/kube-ovn v1.12.7 1.12.7 Helm chart for Kube-OVN kubeovn/kube-ovn v1.12.6 1.12.4 Helm chart for Kube-OVN kubeovn/kube-ovn 1.12.4 1.12.4 Helm chart for Kube-OVN kubeovn/kube-ovn 0.1.0 1.12.0 Helm chart for Kube-OVN

bobz965 commented 5 months ago

v1.12-mc have the "more than 1 external subnet per vpc" feature

v1.12-mc has the helm charts, but not push the helm package. you can build it for your own use.

you can also use the install.sh.

Smithx10 commented 5 months ago

I was able to get EIP created, but ran into a snag: https://github.com/kubeovn/kube-ovn/issues/3909

Using annotation works as expected...

When using EIP / FIP crd, pinging and http don't route back from outside the cluster.
64 bytes from 172.16.128.9: icmp_seq=1 ttl=61 time=2.13 ms (DIFFERENT ADDRESS!) 9:33 How do I get EIP / FIP CRD to work properly?

kind: OvnEip
apiVersion: kubeovn.io/v1
metadata:
  name: eip-static
spec:
  externalSubnet: external
  type: nat
---
kind: OvnFip
apiVersion: kubeovn.io/v1
metadata:
  name: eip-static
spec:
  ovnEip: eip-static
  ipName: t4.default # the name of the ip crd, which is unique

[ use1 ] root@headnode-01:~/yamls/eip$ k ko nbctl lr-route-list ovn-cluster
IPv4 Routes
Route Table <main>:
             172.16.128.5              10.91.95.254 src-ip
           172.16.128.102              10.91.95.254 src-ip
                0.0.0.0/0                100.64.0.1 dst-ip

[ use1 ] root@headnode-01:~/yamls/eip$ k get ofip
NAME         VPC           V4EIP        V4IP             READY   IPTYPE   IPNAME
eip-static   ovn-cluster   10.91.64.6   172.16.128.103   true             t4.default
[ use1 ] root@headnode-01:~/yamls/eip$ k get oeip
NAME                   V4IP         V6IP   MAC                 TYPE   NAT   READY
eip-static             10.91.64.6          00:00:00:8A:12:D2   nat    fip   true
ovn-cluster-external   10.91.64.1          00:00:00:79:9F:27   lrp          true
bobz965 commented 5 months ago

Hi, can you "traceroute -n " the fip from the cluster outside? and tcpdump the fip in the gw node ?

I need more details.

Smithx10 commented 5 months ago

Is there an easier way to discover what gw node is servicing the EIP?

GW

[ use1 ] root@nsc-08:~/post-boot$ tcpdump -n  -i any dst 10.91.64.7 or src 172.16.128.103
tcpdump: data link type LINUX_SLL2
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes
02:13:10.498126 enp1s0f1 P   IP 10.91.192.164 > 10.91.64.7: ICMP echo request, id 301, seq 255, length 64
02:13:10.498126 external P   IP 10.91.192.164 > 10.91.64.7: ICMP echo request, id 301, seq 255, length 64
02:13:10.498164 c81b05868bb7_h P   IP 172.16.128.103 > 10.91.192.164: ICMP echo reply, id 301, seq 255, length 64
02:13:10.498172 ovn0  In  IP 172.16.128.103 > 10.91.192.164: ICMP echo reply, id 301, seq 255, length 64
02:13:10.498187 nodeip Out IP 172.16.128.103 > 10.91.192.164: ICMP echo reply, id 301, seq 255, length 64
02:13:10.498188 external Out IP 172.16.128.103 > 10.91.192.164: ICMP echo reply, id 301, seq 255, length 64
02:13:10.498192 enp1s0f1 Out IP 172.16.128.103 > 10.91.192.164: ICMP echo reply, id 301, seq 255, length 64
02:13:10.723484 enp1s0f1 P   IP 10.91.192.164 > 10.91.64.7: ICMP echo request, id 303, seq 88, length 64
02:13:10.723484 external P   IP 10.91.192.164 > 10.91.64.7: ICMP echo request, id 303, seq 88, length 64
02:13:10.723518 c81b05868bb7_h P   IP 172.16.128.103 > 10.91.192.164: ICMP echo reply, id 303, seq 88, length 64
02:13:10.723524 ovn0  In  IP 172.16.128.103 > 10.91.192.164: ICMP echo reply, id 303, seq 88, length 64
02:13:10.723538 nodeip Out IP 172.16.128.103 > 10.91.192.164: ICMP echo reply, id 303, seq 88, length 64
02:13:10.723539 external Out IP 172.16.128.103 > 10.91.192.164: ICMP echo reply, id 303, seq 88, length 64
02:13:10.723542 enp1s0f1 Out IP 172.16.128.103 > 10.91.192.164: ICMP echo reply, id 303, seq 88, length 64
02:13:11.523443 enp1s0f1 P   IP 10.91.192.164 > 10.91.64.7: ICMP echo request, id 301, seq 256, length 64
02:13:11.523443 external P   IP 10.91.192.164 > 10.91.64.7: ICMP echo request, id 301, seq 256, length 64
02:13:11.523477 c81b05868bb7_h P   IP 172.16.128.103 > 10.91.192.164: ICMP echo reply, id 301, seq 256, length 64
02:13:11.523483 ovn0  In  IP 172.16.128.103 > 10.91.192.164: ICMP echo reply, id 301, seq 256, length 64
02:13:11.523498 nodeip Out IP 172.16.128.103 > 10.91.192.164: ICMP echo reply, id 301, seq 256, length 64
02:13:11.523499 external Out IP 172.16.128.103 > 10.91.192.164: ICMP echo reply, id 301, seq 256, length 64
02:13:11.523501 enp1s0f1 Out IP 172.16.128.103 > 10.91.192.164: ICMP echo reply, id 301, seq 256, length 64
02:13:11.736843 enp1s0f1 P   IP 10.91.192.164 > 10.91.64.7: ICMP echo request, id 303, seq 89, length 64
02:13:11.736843 external P   IP 10.91.192.164 > 10.91.64.7: ICMP echo request, id 303, seq 89, length 64
02:13:11.736876 c81b05868bb7_h P   IP 172.16.128.103 > 10.91.192.164: ICMP echo reply, id 303, seq 89, length 64
02:13:11.736882 ovn0  In  IP 172.16.128.103 > 10.91.192.164: ICMP echo reply, id 303, seq 89, length 64
02:13:11.736896 nodeip Out IP 172.16.128.103 > 10.91.192.164: ICMP echo reply, id 303, seq 89, length 64
02:13:11.736898 external Out IP 172.16.128.103 > 10.91.192.164: ICMP echo reply, id 303, seq 89, length 64
02:13:11.736900 enp1s0f1 Out IP 172.16.128.103 > 10.91.192.164: ICMP echo reply, id 303, seq 89, length 64
02:13:12.536930 enp1s0f1 P   IP 10.91.192.164 > 10.91.64.7: ICMP echo request, id 301, seq 257, length 64
02:13:12.536930 external P   IP 10.91.192.164 > 10.91.64.7: ICMP echo request, id 301, seq 257, length 64
02:13:12.536984 c81b05868bb7_h P   IP 172.16.128.103 > 10.91.192.164: ICMP echo reply, id 301, seq 257, length 64
02:13:12.537000 ovn0  In  IP 172.16.128.103 > 10.91.192.164: ICMP echo reply, id 301, seq 257, length 64
02:13:12.537014 nodeip Out IP 172.16.128.103 > 10.91.192.164: ICMP echo reply, id 301, seq 257, length 64
02:13:12.537016 external Out IP 172.16.128.103 > 10.91.192.164: ICMP echo reply, id 301, seq 257, length 64
02:13:12.537018 enp1s0f1 Out IP 172.16.128.103 > 10.91.192.164: ICMP echo reply, id 301, seq 257, length 64

Client

~ Γ¥»Γ¥»Γ¥» traceroute -n 10.91.64.7
traceroute to 10.91.64.7 (10.91.64.7), 30 hops max, 60 byte packets
 1  10.91.192.254  0.734 ms  0.419 ms  0.642 ms
 2  10.91.64.7  3.049 ms  2.833 ms  2.828 ms
 3  * * *
 4  * * *
 5  * * *
 6  * * *
 7  * * *
 8  * * *
 9  * * *
10  * * *
11  * * *
12  * * *
13  * * *
14  * * *
15  * * *
16  * * *
17  * * *
18  * * *
19  * * *
20  * * *
21  * * *
22  * * *
23  * * *
24  * * *
25  * * *
26  * * *
27  * * *
28  * * *
29  * * *
30  * * *

64 bytes from 172.16.128.103: icmp_seq=104 ttl=61 time=0.409 ms (DIFFERENT ADDRESS!)
64 bytes from 172.16.128.103: icmp_seq=105 ttl=61 time=0.230 ms (DIFFERENT ADDRESS!)
64 bytes from 172.16.128.103: icmp_seq=106 ttl=61 time=0.286 ms (DIFFERENT ADDRESS!)
64 bytes from 172.16.128.103: icmp_seq=107 ttl=61 time=0.318 ms (DIFFERENT ADDRESS!)
64 bytes from 172.16.128.103: icmp_seq=108 ttl=61 time=0.405 ms (DIFFERENT ADDRESS!)
64 bytes from 172.16.128.103: icmp_seq=109 ttl=61 time=0.271 ms (DIFFERENT ADDRESS!)
64 bytes from 172.16.128.103: icmp_seq=110 ttl=61 time=0.357 ms (DIFFERENT ADDRESS!)
64 bytes from 172.16.128.103: icmp_seq=111 ttl=61 time=0.285 ms (DIFFERENT ADDRESS!)
64 bytes from 172.16.128.103: icmp_seq=112 ttl=61 time=0.265 ms (DIFFERENT ADDRESS!)
64 bytes from 172.16.128.103: icmp_seq=113 ttl=61 time=0.414 ms (DIFFERENT ADDRESS!)
64 bytes from 172.16.128.103: icmp_seq=114 ttl=61 time=0.294 ms (DIFFERENT ADDRESS!)
Smithx10 commented 5 months ago

Here is an example of annotation:

GW

02:17:50.035406 enp23s0f1np1 P   IP 10.91.192.164 > 10.91.64.200: ICMP echo request, id 305, seq 90, length 64
02:17:50.035406 external P   IP 10.91.192.164 > 10.91.64.200: ICMP echo request, id 305, seq 90, length 64
02:17:51.036516 enp23s0f1np1 P   IP 10.91.192.164 > 10.91.64.200: ICMP echo request, id 305, seq 91, length 64
02:17:51.036516 external P   IP 10.91.192.164 > 10.91.64.200: ICMP echo request, id 305, seq 91, length 64
02:17:52.037832 enp23s0f1np1 P   IP 10.91.192.164 > 10.91.64.200: ICMP echo request, id 305, seq 92, length 64
02:17:52.037832 external P   IP 10.91.192.164 > 10.91.64.200: ICMP echo request, id 305, seq 92, length 64
02:17:53.038756 enp23s0f1np1 P   IP 10.91.192.164 > 10.91.64.200: ICMP echo request, id 305, seq 93, length 64
02:17:53.038756 external P   IP 10.91.192.164 > 10.91.64.200: ICMP echo request, id 305, seq 93, length 64
02:17:54.040157 enp23s0f1np1 P   IP 10.91.192.164 > 10.91.64.200: ICMP echo request, id 305, seq 94, length 64
02:17:54.040157 external P   IP 10.91.192.164 > 10.91.64.200: ICMP echo request, id 305, seq 94, length 64
02:17:55.042251 enp23s0f1np1 P   IP 10.91.192.164 > 10.91.64.200: ICMP echo request, id 305, seq 95, length 64
02:17:55.042251 external P   IP 10.91.192.164 > 10.91.64.200: ICMP echo request, id 305, seq 95, length 64

Client:

64 bytes from 10.91.64.200: icmp_seq=93 ttl=62 time=0.908 ms
64 bytes from 10.91.64.200: icmp_seq=94 ttl=62 time=0.700 ms
64 bytes from 10.91.64.200: icmp_seq=95 ttl=62 time=0.714 ms
64 bytes from 10.91.64.200: icmp_seq=96 ttl=62 time=0.742 ms
64 bytes from 10.91.64.200: icmp_seq=97 ttl=62 time=0.806 ms
64 bytes from 10.91.64.200: icmp_seq=98 ttl=62 time=0.766 ms
64 bytes from 10.91.64.200: icmp_seq=99 ttl=62 time=0.679 ms
64 bytes from 10.91.64.200: icmp_seq=100 ttl=62 time=0.685 ms
64 bytes from 10.91.64.200: icmp_seq=101 ttl=62 time=0.653 ms
64 bytes from 10.91.64.200: icmp_seq=102 ttl=62 time=0.849 ms
64 bytes from 10.91.64.200: icmp_seq=103 ttl=62 time=0.683 ms
^C
bobz965 commented 5 months ago

Is there an easier way to discover what gw node is servicing the EIP?

run kubectl ko nbctl show, you will see

image

image

bobz965 commented 5 months ago

in your case of ofip 10.91.64.6 [ use1 ] root@headnode-01:~/yamls/eip$ k get ofip NAME VPC V4EIP V4IP READY IPTYPE IPNAME eip-static ovn-cluster 10.91.64.6 172.16.128.103 true t4.default

  1. please exec it the pod 172.16.128.103, run ip a route -n traceroute -n <dest ip>

  2. tcpdump in the gw node, tcpdump -i any host <dest IP> -netvv

zhangzujian commented 5 months ago

Are you using vlan 2074 in kube-ovn? Seems there is a vlan interface named external.2073 in your machine.

Smithx10 commented 5 months ago

@zhangzujian This is our typical network configuration all the prefix'd "br-" are created via kube-ovn.

[ use1 ] root@nvme-01:~/post-boot$ networkctl
IDX LINK           TYPE     OPERATIONAL SETUP
  1 lo             loopback carrier     unmanaged
  2 enp23s0f0np0   ether    enslaved    unmanaged
  3 enp23s0f1np1   ether    enslaved    unmanaged
  4 ens1f4         ether    enslaved    unmanaged
  5 ens1f4d1       ether    enslaved    unmanaged
  6 storage        bond     degraded    configured
  7 external       bond     degraded    configured
  8 nodeip         vlan     routable    configured
  9 underlay       vlan     routable    configured
 10 ovs-system     ether    off         unmanaged
 11 br-int         ether    off         unmanaged
 12 mirror0        ether    degraded    unmanaged
 13 genev_sys_6081 geneve   enslaved    unmanaged
 14 ovn0           ether    routable    unmanaged
 15 br-external    ether    routable    unmanaged
 17 823376cd3c44_h ether    enslaved    unmanaged
 18 br-storage     ether    routable    unmanaged
 22 6c9669b8efbb_h ether    enslaved    unmanaged
 24 76e5d5695967_h ether    enslaved    unmanaged
 26 a47e9bc4e0f2_h ether    enslaved    unmanaged
 28 f6928359c70c_h ether    enslaved    unmanaged
 30 760bda7734b1_h ether    enslaved    unmanaged
 32 32702617f47c_h ether    enslaved    unmanaged
 34 323277ccc08f_h ether    enslaved    unmanaged
 41 baad3e4bd9c9_h ether    enslaved    unmanaged

Just for notes: GW Config:

[ use1 ] root@headnode-01:~/yamls/eip$ cat external-gw.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: ovn-external-gw-config
  namespace: kube-system
data:
  enable-external-gw: "true"
  type: "decentralized"
  external-gw-nic: "br-external"
  external-gw-addr: "10.91.95.254/19"

@bobz965 Pod

Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         172.16.128.1    0.0.0.0         UG    0      0        0 eth0
172.16.128.0    0.0.0.0         255.255.128.0   U     0      0        0 eth0

/ # ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
49: eth0@if50: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 8900 qdisc noqueue state UP
    link/ether 00:00:00:9f:ba:ea brd ff:ff:ff:ff:ff:ff
    inet 172.16.128.105/17 brd 172.16.255.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::200:ff:fe9f:baea/64 scope link
       valid_lft forever preferred_lft forever

64 bytes from 10.91.192.164: seq=34 ttl=61 time=0.598 ms
64 bytes from 10.91.192.164: seq=35 ttl=61 time=0.658 ms
^C
--- 10.91.192.164 ping statistics ---
36 packets transmitted, 36 packets received, 0% packet loss
round-trip min/avg/max = 0.469/0.682/2.893 ms
/ # ping 10.91.192.164^C

/ # traceroute -n 10.91.192.164
traceroute to 10.91.192.164 (10.91.192.164), 30 hops max, 46 byte packets
 1  172.16.128.1  4.324 ms  3.036 ms  2.614 ms
 2  172.16.0.22  2.263 ms  0.007 ms  0.004 ms
 3  10.91.230.254  1.962 ms  0.536 ms  0.472 ms
 4  10.91.192.164  2.109 ms  0.530 ms  0.334 ms
/ # traceroute -n 10.91.192.164
traceroute to 10.91.192.164 (10.91.192.164), 30 hops max, 46 byte packets
 1  172.16.128.1  3.175 ms  2.441 ms  2.418 ms
 2  172.16.0.22  0.004 ms  0.003 ms  0.002 ms
 3  10.91.230.254  0.731 ms  0.579 ms  0.482 ms
 4  10.91.192.164  0.361 ms  0.151 ms  0.131 ms

gateway

13:45:39.328084 579beefd7ca2_h P   IP 172.16.128.105.38110 > 10.91.192.164.33435: UDP, length 18
13:45:39.331534 579beefd7ca2_h P   IP 172.16.128.105.38110 > 10.91.192.164.33436: UDP, length 18
13:45:39.334119 579beefd7ca2_h P   IP 172.16.128.105.38110 > 10.91.192.164.33437: UDP, length 18
13:45:39.336620 579beefd7ca2_h P   IP 172.16.128.105.38110 > 10.91.192.164.33438: UDP, length 18
13:45:39.336652 ovn0  In  IP 172.16.128.105.38110 > 10.91.192.164.33438: UDP, length 18
13:45:39.336836 579beefd7ca2_h P   IP 172.16.128.105.38110 > 10.91.192.164.33439: UDP, length 18
13:45:39.336842 ovn0  In  IP 172.16.128.105.38110 > 10.91.192.164.33439: UDP, length 18
13:45:39.336932 579beefd7ca2_h P   IP 172.16.128.105.38110 > 10.91.192.164.33440: UDP, length 18
13:45:39.336937 ovn0  In  IP 172.16.128.105.38110 > 10.91.192.164.33440: UDP, length 18
13:45:39.336964 579beefd7ca2_h P   IP 172.16.128.105.38110 > 10.91.192.164.33441: UDP, length 18
13:45:39.336969 ovn0  In  IP 172.16.128.105.38110 > 10.91.192.164.33441: UDP, length 18
13:45:39.337860 579beefd7ca2_h P   IP 172.16.128.105.38110 > 10.91.192.164.33442: UDP, length 18
13:45:39.337881 ovn0  In  IP 172.16.128.105.38110 > 10.91.192.164.33442: UDP, length 18
13:45:39.338627 579beefd7ca2_h P   IP 172.16.128.105.38110 > 10.91.192.164.33443: UDP, length 18
13:45:39.338649 ovn0  In  IP 172.16.128.105.38110 > 10.91.192.164.33443: UDP, length 18
13:45:39.339208 579beefd7ca2_h P   IP 172.16.128.105.38110 > 10.91.192.164.33444: UDP, length 18
13:45:39.339216 ovn0  In  IP 172.16.128.105.38110 > 10.91.192.164.33444: UDP, length 18
13:45:39.339677 579beefd7ca2_h P   IP 172.16.128.105.38110 > 10.91.192.164.33445: UDP, length 18
13:45:39.339682 ovn0  In  IP 172.16.128.105.38110 > 10.91.192.164.33445: UDP, length 18
13:45:39.339914 579beefd7ca2_h P   IP 172.16.128.105.38110 > 10.91.192.164.33446: UDP, length 18
13:45:39.339919 ovn0  In  IP 172.16.128.105.38110 > 10.91.192.164.33446: UDP, length 18
13:45:39.748168 579beefd7ca2_h P   ARP, Request who-has 172.16.128.1 tell 172.16.128.105, length 28
Smithx10 commented 5 months ago

Here is the pod with annotation (working)

[ use1 ] root@headnode-01:~/yamls/eip$ k exec -it t3 -- /bin/sh
/ # ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
42: eth0@if43: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 8900 qdisc noqueue state UP
    link/ether 00:00:00:9d:a2:d5 brd ff:ff:ff:ff:ff:ff
    inet 172.16.128.106/17 brd 172.16.255.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::200:ff:fe9d:a2d5/64 scope link
       valid_lft forever preferred_lft forever

/ # netstat -rn
Kernel IP routing table
Destination     Gateway         Genmask         Flags   MSS Window  irtt Iface
0.0.0.0         172.16.128.1    0.0.0.0         UG        0 0          0 eth0
172.16.128.0    0.0.0.0         255.255.128.0   U         0 0          0 eth0

/ # traceroute 10.91.192.164 -n
traceroute to 10.91.192.164 (10.91.192.164), 30 hops max, 46 byte packets
 1  172.16.128.1  2.863 ms  2.393 ms  2.694 ms
 2  10.91.95.254  4.934 ms  0.888 ms  0.732 ms
 3  10.91.192.164  2.061 ms  0.621 ms  0.620 ms
Smithx10 commented 5 months ago

Looks like the CRD has an addition hop to the gateway node. Are the implementations the same ?

[ use1 ] root@headnode-01:~/yamls/eip$ k get node -o wide | grep 172.16.0.22 nvme-03 Ready 15h v1.29.0+rke2r1 172.16.0.22 10.91.230.22 Debian GNU/Linux 12 (bookworm) 6.1.0-17-amd64 containerd://1.7.11-k3s2

Smithx10 commented 5 months ago

Looks like the annotation is creating a static route, attempting to look at the code I don't think the EIP / FIP is doing this.

I am wondering if because I am using "distributed" if this is the problem.

I0412 16:44:38.940925 1 pod.go:824] sync pod default/t3 routed I0412 16:44:38.940962 1 vpc.go:721] vpc ovn-cluster add static route: &{Policy:policySrc CIDR:172.16.128.130 NextHopIP:10.91.95.254 ECMPMode: BfdID: RouteTable:} I0412 16:44:38.943079 1 ovn-nb-logical_router_route.go:103] logical router ovn-cluster del static routes: [] I0412 16:44:38.962986 1 pod.go:372] enqueue update pod default/t3 I0412 16:44:38.963098 1 pod.go:428] take 61 ms to handle sync pod default/t3

Not sure what "stateless" does in the option.
if c.ovnFipChangeEip(fip, cachedEip) { klog.Infof("fip change ip, old ip '%s', new ip %s", fip.Status.V4Ip, cachedEip.Status.V4Ip) if err = c.OVNNbClient.DeleteNat(vpcName, ovnnb.NATTypeDNATAndSNAT, fip.Status.V4Ip, internalV4Ip); err != nil { klog.Errorf("failed to create fip, %v", err) return err } // ovn add fip options := map[string]string{"staleless": strconv.FormatBool(c.ExternalGatewayType == kubeovnv1.GWDistributedType)} if err = c.OVNNbClient.AddNat(vpcName, ovnnb.NATTypeDNATAndSNAT, cachedEip.Status.V4Ip, internalV4Ip, mac, cachedFip.Spec.IPName, options); err != nil { klog.Errorf("failed to create fip, %v", err) return err } if err = c.natLabelAndAnnoOvnEip(eipName, fip.Name, vpcName); err != nil { klog.Errorf("failed to label fip '%s' in eip %s, %v", fip.Name, eipName, err) return err } if err = c.patchOvnFipAnnotations(key, eipName); err != nil { klog.Errorf("failed to update label for fip %s, %v", key, err) return err } if err = c.patchOvnFipStatus(key, vpcName, cachedEip.Status.V4Ip, internalV4Ip, true); err != nil { klog.Errorf("failed to patch status for fip '%s', %v", key, err) return err } return nil }

Going to try centralized and see if that changes things.

Smithx10 commented 5 months ago

@bobz965 I posted the latest logs, any idea why we have that additional hop?

bobz965 commented 5 months ago

in your info:


[ use1 ] root@headnode-01:~/yamls/eip$ k ko nbctl lr-route-list ovn-cluster
IPv4 Routes
Route Table <main>:
             172.16.128.5              10.91.95.254 src-ip
           172.16.128.102              10.91.95.254 src-ip
                0.0.0.0/0                100.64.0.1 dst-ip

[ use1 ] root@headnode-01:~/yamls/eip$ k get ofip
NAME         VPC           V4EIP        V4IP             READY   IPTYPE   IPNAME
eip-static   ovn-cluster   10.91.64.6   172.16.128.103   true             t4.default

the pod use the 172.16.128.103 which is the ofip, seems not trigger the dnat_and_snat.

please show lr-policy-list ovn-cluster.

probably the difference between annotation eip and ofip is only in the lr-policy-list .

github-actions[bot] commented 3 months ago

Issues go stale after 60d of inactivity. Please comment or re-open the issue if you are still interested in getting this issue fixed.

github-actions[bot] commented 3 weeks ago

Issues go stale after 60d of inactivity. Please comment or re-open the issue if you are still interested in getting this issue fixed.