kubeovn / kube-ovn

A Bridge between SDN and Cloud Native (Project under CNCF)
https://kubeovn.github.io/docs/stable/en/
Apache License 2.0
1.96k stars 446 forks source link

v1.13.0 自定义VPC下,子网配置SNAT后,子网下的pod无法访问大部分网段ip #3801

Closed geniusxiong closed 5 months ago

geniusxiong commented 8 months ago

Bug Report

v1.13.0 自定义VPC下,子网配置SNAT后,子网下的pod只能访问snat同网段的ip地址,不能访问其他网段的ip

Expected Behavior

能够访问可访问到的ip

Actual Behavior

Steps to Reproduce the Problem

  1. 根据https://kubeovn.github.io/docs/v1.13.x/advance/ovn-eip-fip-snat/#31-ovn-snat-subnet-cidr 准备underlay公网网络

[root@master-0 ovn]# cat provider-network.yaml apiVersion: kubeovn.io/v1 kind: ProviderNetwork metadata: name: external204 spec: defaultInterface: ens33

[root@master-0 ovn]# cat vlan.yaml apiVersion: kubeovn.io/v1 kind: Vlan metadata: name: vlan0 spec: id: 0 provider: external204

[root@master-0 ovn]# cat vlan-subnet.yaml apiVersion: kubeovn.io/v1 kind: Subnet metadata: name: external204 spec: protocol: IPv4 cidrBlock: 172.18.164.0/24 gateway: 172.18.164.254 vlan: vlan0 excludeIps:

[root@master-0 ovn]# cat ns.yaml
apiVersion: v1
kind: Namespace
metadata:
  name: vpc1-ns

[root@master-0 ovn]# cat vpc.yaml
kind: Vpc
apiVersion: kubeovn.io/v1
metadata:
  annotations:
    bs.network.io/cidrBlock: 10.50.0.0/16
    bs.network.io/workspace: custom-vm
    kubesphere.io/alias-name: vpc1
    kubesphere.io/creator: sean
  name: vpc1
spec:
  namespaces:
  - vpc1-ns
  enableExternal: true

[root@master-0 ovn]# cat vpc-subnet.yaml
apiVersion: kubeovn.io/v1
kind: Subnet
metadata:
  name: vpc1-subnet1
spec:
  cidrBlock: 10.50.1.0/24
  default: false
  disableGatewayCheck: false
  disableInterConnection: true
  enableEcmp: true
  gatewayNode: ""
  gatewayType: distributed
  natOutgoing: false
  private: false
  protocol: IPv4
  provider: ovn
  vpc: vpc1
  namespaces:
  - vpc1-ns
  1. 子网snat
    
    [root@master-0 ovn]# cat ovneip.yaml 
    kind: OvnEip
    apiVersion: kubeovn.io/v1
    metadata:
    name: snat-for-subnet-in-vpc
    spec:
    externalSubnet: external204
    type: nat

[root@master-0 ovn]# cat ovn-subnet-snat.yaml kind: OvnSnatRule apiVersion: kubeovn.io/v1 metadata: name: snat-for-subnet-in-vpc spec: ovnEip: snat-for-subnet-in-vpc vpcSubnet: vpc1-subnet1 # eip 对应整个网段

5.  看配置已经生效

[root@master-0 ovn]# kubectl ko nbctl show vpc1 router c87bf6bc-6dd4-4836-bc14-10b224a37dbe (vpc1) port vpc1-vpc1-subnet1 mac: "00:00:00:2C:8E:D1" networks: ["10.50.1.1/24"] port vpc1-external204 mac: "00:00:00:2F:90:BA" networks: ["172.18.164.60/24"] gateway chassis: [5449d348-3f4b-4c5e-ba0d-eb0fa987c3d0 b76a3e7e-9c42-41f3-ac4b-7d2235247a41] nat 54ed768f-963e-43c2-b85b-b9bc397a40e9 external ip: "172.18.164.58" logical ip: "10.50.1.0/24" type: "snat"

6.  在vpc1-subnet1下创建一个pod
![图片](https://github.com/kubeovn/kube-ovn/assets/25982171/0d7076a4-1449-411d-972d-c87a6b79fb8f)
进pod,ping provider-network的同网段ip 172.18.164.12,可通

/ # ip addr 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever 83: eth0@if84: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1400 qdisc noqueue state UP link/ether 00:00:00:1b:ab:b9 brd ff:ff:ff:ff:ff:ff inet 10.50.1.4/24 brd 10.50.1.255 scope global eth0 valid_lft forever preferred_lft forever

/ # ping -s 100 172.18.164.12 PING 172.18.164.12 (172.18.164.12): 100 data bytes 108 bytes from 172.18.164.12: seq=0 ttl=127 time=1.744 ms 108 bytes from 172.18.164.12: seq=1 ttl=127 time=1.440 ms 108 bytes from 172.18.164.12: seq=2 ttl=127 time=0.573 ms 108 bytes from 172.18.164.12: seq=3 ttl=127 time=0.438 ms 108 bytes from 172.18.164.12: seq=4 ttl=127 time=2.867 ms 108 bytes from 172.18.164.12: seq=5 ttl=127 time=0.655 ms 108 bytes from 172.18.164.12: seq=6 ttl=127 time=2.517 ms 108 bytes from 172.18.164.12: seq=7 ttl=127 time=1.195 ms 108 bytes from 172.18.164.12: seq=8 ttl=127 time=0.913 ms 108 bytes from 172.18.164.12: seq=9 ttl=127 time=0.558 ms 108 bytes from 172.18.164.12: seq=10 ttl=127 time=0.352 ms 108 bytes from 172.18.164.12: seq=11 ttl=127 time=0.359 ms 108 bytes from 172.18.164.12: seq=12 ttl=127 time=0.691 ms 108 bytes from 172.18.164.12: seq=13 ttl=127 time=0.440 ms 108 bytes from 172.18.164.12: seq=14 ttl=127 time=0.800 ms 108 bytes from 172.18.164.12: seq=15 ttl=127 time=0.421 ms ^C --- 172.18.164.12 ping statistics --- 16 packets transmitted, 16 packets received, 0% packet loss round-trip min/avg/max = 0.352/0.997/2.867 ms

在网关节点抓包
![图片](https://github.com/kubeovn/kube-ovn/assets/25982171/ca03791d-77b3-4a1d-bcbc-c490d34e155d)
发现是有snat操作

ping provider-network的不同网段ip 172.18.120.173,不通

/ # ping -s 100 172.18.120.173 PING 172.18.120.173 (172.18.120.173): 100 data bytes ^C --- 172.18.120.173 ping statistics --- 9 packets transmitted, 0 packets received, 100% packet loss

在网关抓包,发现是没有snat
![图片](https://github.com/kubeovn/kube-ovn/assets/25982171/2c4a9647-3b84-4c52-a284-009c3dc44463)

但是172.18.120.173是172.18.164.0/24网段能通的。

大佬,帮看是什么原因

## Additional Info

- Kubernetes version:

  **Output of `kubectl version`:**

1.18.6


- kube-ovn version:

  ```bash
   1.13.0-x86
zcq98 commented 8 months ago

当前配置 ovn 不知道要将 172.18.120.173 路由到哪,尝试在 vpc1 上添加静态路由:

kind: Vpc
apiVersion: kubeovn.io/v1
metadata:
  annotations:
    bs.network.io/cidrBlock: 10.50.0.0/16
    bs.network.io/workspace: custom-vm
    kubesphere.io/alias-name: vpc1
    kubesphere.io/creator: sean
  name: vpc1
spec:
  namespaces:
  - vpc1-ns
  staticRoutes:
  - cidr: 10.50.1.0/24
    nextHopIP: 172.18.164.254
    policy: policySrc
  enableExternal: true

将源地址为 10.50.1.0/24 的流量路由至网关,只有到网关节点的流量才会进行 snat

zcq98 commented 8 months ago

感觉这个地方可以优化,reroute 应该根据 fip、snat和dnat 动态配置,用address_set管理,之前我没考虑到这个 @bobz965

geniusxiong commented 8 months ago

当前配置 ovn 不知道要将 172.18.120.173 路由到哪,尝试在 vpc1 上添加静态路由:

kind: Vpc
apiVersion: kubeovn.io/v1
metadata:
  annotations:
    bs.network.io/cidrBlock: 10.50.0.0/16
    bs.network.io/workspace: custom-vm
    kubesphere.io/alias-name: vpc1
    kubesphere.io/creator: sean
  name: vpc1
spec:
  namespaces:
  - vpc1-ns
  staticRoutes:
  - cidr: 10.50.1.0/24
    nextHopIP: 172.18.164.254
    policy: policySrc
  enableExternal: true

将源地址为 10.50.1.0/24 的流量路由至网关,只有到网关节点的流量才会进行 snat

试了下,这样配置是可以了,但是好像原先使用ovn snat不需要配置静态路由,也可以通的,这样就和自定义VPC下配置静态路由那个章节一样了。 我们原先认为通过OVN EIP FIP SNAT能简化配置的。 这块能否优化优化?

bobz965 commented 8 months ago

和@zcq98讨论了下, 单公网保持和之前一致自动配置一条静态默认路由,多公网使用 策略路由

github-actions[bot] commented 6 months ago

Issues go stale after 60d of inactivity. Please comment or re-open the issue if you are still interested in getting this issue fixed.