kubeovn / kube-ovn

A Bridge between SDN and Cloud Native (Project under CNCF)
https://kubeovn.github.io/docs/stable/en/
Apache License 2.0
2k stars 452 forks source link

外网网关配置,无法ping通EIP #4207

Open yeshl opened 5 months ago

yeshl commented 5 months ago

Kube-OVN Version

1.12.16

Kubernetes Version

1.30

Operation-system/Kernel Version

debian12

Description

外网网关配置,无法ping通EIP,步骤按文档配置:https://kubeovn.github.io/docs/stable/advance/ovn-eip-fip-snat/#31-ovn-snat-subnet-cidr

Steps To Reproduce

#1. kube-ovn-controller 启动参数需要配置:
          - --external-gateway-vlanid=204
          - --external-gateway-switch=external204

# 2. kube-ovn-cni 启动参数需要配置:
          - --external-gateway-switch=external204 

root@master20:~# kubectl -n kube-system get deploy/kube-ovn-controller -oyaml|grep gateway 
        - --default-gateway=10.16.0.1
        - --default-gateway-check=true
        - --default-logical-gateway=false
        - --external-gateway-vlanid=204
        - --external-gateway-switch=external204

root@master20:~# kubectl -n kube-system get ds/kube-ovn-cni -oyaml|grep external
        - --external-gateway-switch=external204

# 准备 provider-network, vlan, subnet
# cat 01-provider-network.yaml
apiVersion: kubeovn.io/v1
kind: ProviderNetwork
metadata:
  name: external204
spec:
  defaultInterface: vlan
---
# cat 02-vlan.yaml
apiVersion: kubeovn.io/v1
kind: Vlan
metadata:
  name: vlan204
spec:
  id: 204
  provider: external204
---
# cat 03-vlan-subnet.yaml
apiVersion: kubeovn.io/v1
kind: Subnet
metadata:
  name: external204
spec:
  protocol: IPv4
  cidrBlock: 10.5.204.0/24
  gateway: 10.5.204.254
  vlan: vlan204
  excludeIps:
    - 10.5.204.1..10.5.204.100
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: ovn-external-gw-config
  namespace: kube-system
data:
  enable-external-gw: "true"
  external-gw-nodes: "node51.host"
  type: "centralized"
  external-gw-nic: "eno4" # 用于接入 ovs 公网网桥的网卡
  external-gw-addr: "10.5.204.254/24" # underlay 物理网关的 ip
---
# cat 00-ns.yml
apiVersion: v1
kind: Namespace
metadata:
  name: vpc1
---
# cat 01-vpc-ecmp-enable-external-bfd.yml
kind: Vpc
apiVersion: kubeovn.io/v1
metadata:
  name: vpc1
spec:
  namespaces:
    - vpc1
  enableExternal: true
# vpc 启用 enableExternal 会自动创建 lrp 关联到上述指定的公网
---
# cat 02-subnet.yml
apiVersion: kubeovn.io/v1
kind: Subnet
metadata:
  name: vpc1-subnet1
spec:
  cidrBlock: 192.168.0.0/24
  default: false
  disableGatewayCheck: false
  disableInterConnection: true
  enableEcmp: true
  gatewayNode: ""
  gatewayType: distributed
  #gatewayType: centralized
  natOutgoing: false
  private: false
  protocol: IPv4
  provider: ovn
  vpc: vpc1
  namespaces:
    - vpc1

apiVersion: v1
kind: Pod
metadata:
  name: vpc-1-busybox01
  namespace: vpc1
spec:
  containers:
    - name: snat-pod
      imagePullPolicy: IfNotPresent
      image: docker.io/library/busybox:1.36.1
      command: [ "/bin/sh", "-c", "trap : TERM INT; sleep infinity & wait" ]

root@master20:~# kubectl get po -o wide -n vpc1 vpc-1-busybox01
NAME              READY   STATUS    RESTARTS   AGE   IP            NODE          NOMINATED NODE   READINESS GATES
vpc-1-busybox01   1/1     Running   0          10s   192.168.0.2   node30.host   <none>           <none>

root@master20:~# kubectl get ip vpc-1-busybox01.vpc1
NAME                   V4IP          V6IP   MAC                 NODE          SUBNET
vpc-1-busybox01.vpc1   192.168.0.2          66:0b:61:23:bd:08   node30.host   vpc1-subnet1

---
kind: OvnEip
apiVersion: kubeovn.io/v1
metadata:
  name: eip-static
spec:
  externalSubnet: external204
  type: nat
---
kind: OvnFip
apiVersion: kubeovn.io/v1
metadata:
  name: eip-static
spec:
  ovnEip: eip-static
  ipName: vpc-1-busybox01.vpc1

root@master20:~# kubectl  get ofip
NAME         VPC    V4EIP          V4IP          READY   IPTYPE   IPNAME
eip-static   vpc1   10.5.204.103   192.168.0.2   true             vpc-1-busybox01.vpc1
root@master20:~# ping 10.5.204.101
PING 10.5.204.101 (10.5.204.101) 56(84) bytes of data.
文档中这里可以ping通,我测试ping不通

root@master20:~# kubectl ko nbctl show vpc1
router 14c37150-b405-4d38-8264-f6f0f1de60ef (vpc1)
    port vpc1-external204
        mac: "3e:ca:57:0f:dd:56"
        networks: ["10.5.204.102/24"]
        gateway chassis: [cd1f1f72-69aa-4e92-85ef-d7b7b0e074e0]
    port vpc1-vpc1-subnet1
        mac: "de:e2:48:4c:2e:fa"
        networks: ["192.168.0.1/24"]
    nat 05089518-1a25-4693-bddb-dc8e494492a8
        external ip: "10.5.204.103"
        logical ip: "192.168.0.2"
        type: "dnat_and_snat"
root@master20:~# kubectl ko nbctl lr-route-list vpc1
IPv4 Routes
Route Table <main>:
                0.0.0.0/0              10.5.204.254 dst-ip
root@master20:~# kk s external204
switch 78d8b48f-244a-4022-bcdc-0d32134dd280 (external204)
    port external204-ovn-cluster
        type: router
        router-port: ovn-cluster-external204
    port localnet.external204
        type: localnet
        tag: 204
        addresses: ["unknown"]
    port external204-vpc1
        type: router
        router-port: vpc1-external204
root@master20:~# kk s ovn-cluster
router 58c4b73d-d310-480d-8b23-f838b864ebea (ovn-cluster)
    port ovn-cluster-ovn-default
        mac: "d2:be:5f:c4:28:f0"
        networks: ["10.16.0.1/16"]
    port ovn-cluster-join
        mac: "76:cb:3c:f6:92:aa"
        networks: ["100.64.0.1/16"]
    port ovn-cluster-external204
        mac: "fa:d9:13:cc:ff:d4"
        networks: ["10.5.204.101/24"]
        gateway chassis: [cd1f1f72-69aa-4e92-85ef-d7b7b0e074e0]
root@master20:~# kk s vpc1-subnet1
switch fda16ae0-93a5-4daf-9eff-8ea5f5a5f6f5 (vpc1-subnet1)
    port vpc-1-busybox01.vpc1
        addresses: ["66:0b:61:23:bd:08 192.168.0.2"]
    port vpc1-subnet1-vpc1
        type: router
        router-port: vpc1-vpc1-subnet1

Current Behavior

外网网关配置,无法ping通EIP

Expected Behavior

ping pod 的公网 ip 是能通

bobz965 commented 5 months ago

看起来配置都是正常的,你的抓包的图可以附带一下呗?

yeshl commented 5 months ago

image 看到下面多了路由,我bond0上ip是:10.0.3.20 10.5.204.0/24 via 100.64.0.1 dev ovn0 src 10.0.3.20

bobz965 commented 5 months ago

我把 e2e 的信息拿出来给你先参考下,看是否能解决。

e2e 参考方式

image

e2e 细节

参考该文档构建e2e 环境: https://kubeovn.github.io/docs/stable/reference/dev-env/ 我的 e2e 运行在一台 ubuntu 24 pc 上:

执行如下命令运行 e2e:


make kind-init
make kind-install
make ovn-vpc-nat-gw-conformance-e2e

(v) root@ae86:~# k get cm -n kube-system          ovn-external-gw-config   -o yaml
apiVersion: v1
data:
  enable-external-gw: "true"
  external-gw-addr: 172.19.0.0/16
  external-gw-nic: eth1
  external-gw-nodes: kube-ovn-worker,kube-ovn-control-plane
  type: centralized
kind: ConfigMap
metadata:
  creationTimestamp: "2024-06-22T01:44:00Z"
  name: ovn-external-gw-config
  namespace: kube-system
  resourceVersion: "2218"
  uid: c23104d8-c045-42ae-a6fc-8ba1a510f2ca

(v) root@ae86:~# k get vpc
NAME                   ENABLEEXTERNAL   ENABLEBFD   STANDBY   SUBNETS                                                       EXTRAEXTERNALSUBNETS   NAMESPACES
no-bfd-vpc-103132165   true             false       true      ["no-bfd-subnet-199358593","no-bfd-extra-subnet-100037991"]   ["extra"]              
ovn-cluster            true             false       true      ["join","ovn-default","external","extra"]                                            
(v) root@ae86:~# 
(v) root@ae86:~# 
(v) root@ae86:~# k get subnet
NAME                            PROVIDER   VPC                    VLAN                   PROTOCOL   CIDR             PRIVATE   NAT     DEFAULT   GATEWAYTYPE   V4USED   V4AVAILABLE   V6USED   V6AVAILABLE   EXCLUDEIPS                                 U2OINTERCONNECTIONIP
external                        ovn        ovn-cluster            vlan-195999955         IPv4       172.19.0.0/16    false     false   false     distributed   3        65528         0        0             ["172.19.0.1","172.19.0.2","172.19.0.3"]   
extra                           ovn        ovn-cluster            vlan-extra-101608684   IPv4       172.20.0.0/16    false     false   false     distributed   2        65529         0        0             ["172.20.0.1","172.20.0.2","172.20.0.3"]   
join                            ovn        ovn-cluster                                   IPv4       100.64.0.0/16    false     false   false     distributed   2        65531         0        0             ["100.64.0.1"]                             
no-bfd-extra-subnet-100037991   ovn        no-bfd-vpc-103132165                          IPv4       192.168.3.0/24   false     false   false     distributed   3        250           0        0             ["192.168.3.1"]                            
no-bfd-subnet-199358593         ovn        no-bfd-vpc-103132165                          IPv4       192.168.0.0/24   false     false   false     distributed   4        249           0        0             ["192.168.0.1"]                            
ovn-default                     ovn        ovn-cluster                                   IPv4       10.16.0.0/16     false     true    true      distributed   4        65529         0        0             ["10.16.0.1"]                              
(v) root@ae86:~# 
(v) root@ae86:~# 
(v) root@ae86:~# k get vpc no-bfd-vpc-103132165 -o yaml
apiVersion: kubeovn.io/v1
kind: Vpc
metadata:
  creationTimestamp: "2024-06-22T01:44:00Z"
  generation: 2
  name: no-bfd-vpc-103132165
  resourceVersion: "2836"
  uid: dce2a257-0929-40cb-b98a-136eb6f53c1f
spec:
  enableExternal: true
  extraExternalSubnets:
  - extra
  staticRoutes:
  - bfdId: ""
    cidr: 192.168.3.0/24
    ecmpMode: ""
    nextHopIP: 172.20.0.1
    policy: policySrc
    routeTable: ""
status:
  default: false
  defaultLogicalSwitch: ""
  enableBfd: false
  enableExternal: true
  extraExternalSubnets:
  - extra
  router: no-bfd-vpc-103132165
  sctpLoadBalancer: vpc-no-bfd-vpc-103132165-sctp-load
  sctpSessionLoadBalancer: vpc-no-bfd-vpc-103132165-sctp-sess-load
  standby: true
  subnets:
  - no-bfd-subnet-199358593
  - no-bfd-extra-subnet-100037991
  tcpLoadBalancer: vpc-no-bfd-vpc-103132165-tcp-load
  tcpSessionLoadBalancer: vpc-no-bfd-vpc-103132165-tcp-sess-load
  udpLoadBalancer: vpc-no-bfd-vpc-103132165-udp-load
  udpSessionLoadBalancer: vpc-no-bfd-vpc-103132165-udp-sess-load
(v) root@ae86:~# 
(v) root@ae86:~# k get ofip
NAME                                   VPC                    V4EIP        V6EIP   V4IP          V6IP   READY   IPTYPE   IPNAME
fip-extra-pod-174516449                no-bfd-vpc-103132165   172.20.0.5           192.168.3.4          true             fip-extra-pod-174516449.ovn-vpc-nat-gw-529
fip-pod-192408216                      no-bfd-vpc-103132165   172.19.0.7           192.168.0.4          true             fip-pod-192408216.ovn-vpc-nat-gw-529
shared-eip-fip-should-fail-107345702                                                                            vip      shared-vip-129997953
shared-eip-fip-should-ok-196720677     no-bfd-vpc-103132165   172.19.0.5           192.168.0.5          true    vip      shared-vip-129997953
(v) root@ae86:~# k get subnet no-bfd-subnet-199358593 -o yaml
apiVersion: kubeovn.io/v1
kind: Subnet
metadata:
  creationTimestamp: "2024-06-22T01:44:02Z"
  finalizers:
  - kubeovn.io/kube-ovn-controller
  generation: 2
  name: no-bfd-subnet-199358593
  resourceVersion: "2844"
  uid: ec41ad4a-1e07-4208-9c23-324a88413b99
spec:
  cidrBlock: 192.168.0.0/24
  default: false
  enableLb: true
  excludeIps:
  - 192.168.0.1
  gateway: 192.168.0.1
  gatewayNode: ""
  gatewayType: distributed
  natOutgoing: false
  private: false
  protocol: IPv4
  provider: ovn
  vpc: no-bfd-vpc-103132165
status:
  activateGateway: ""
  conditions:
  - lastTransitionTime: "2024-06-22T01:44:02Z"
    lastUpdateTime: "2024-06-22T01:45:53Z"
    reason: ResetLogicalSwitchAclSuccess
    status: "True"
    type: Validated
  - lastTransitionTime: "2024-06-22T01:44:04Z"
    lastUpdateTime: "2024-06-22T01:44:04Z"
    reason: ResetLogicalSwitchAclSuccess
    status: "True"
    type: Ready
  - lastTransitionTime: "2024-06-22T01:44:04Z"
    lastUpdateTime: "2024-06-22T01:44:04Z"
    message: Not Observed
    reason: Init
    status: Unknown
    type: Error
  dhcpV4OptionsUUID: ""
  dhcpV6OptionsUUID: ""
  natOutgoingPolicyRules: []
  u2oInterconnectionIP: ""
  u2oInterconnectionMAC: ""
  u2oInterconnectionVPC: ""
  v4availableIPrange: 192.168.0.6-192.168.0.254
  v4availableIPs: 249
  v4usingIPrange: 192.168.0.2-192.168.0.5
  v4usingIPs: 4
  v6availableIPrange: ""
  v6availableIPs: 0
  v6usingIPrange: ""
  v6usingIPs: 0
(v) root@ae86:~# 

(v) root@ae86:~# k get provider-networks external -o yaml
apiVersion: kubeovn.io/v1
kind: ProviderNetwork
metadata:
  creationTimestamp: "2024-06-22T01:43:36Z"
  generation: 1
  name: external
  resourceVersion: "2175"
  uid: 7b834dfa-32b6-4665-8255-04ee844158b4
spec:
  defaultInterface: eth1
status:
  conditions:
  - lastTransitionTime: "2024-06-22T01:43:48Z"
    lastUpdateTime: "2024-06-22T01:43:48Z"
    node: kube-ovn-control-plane
    reason: InitOVSBridgeSucceeded
    status: "True"
    type: Ready
  - lastTransitionTime: "2024-06-22T01:43:48Z"
    lastUpdateTime: "2024-06-22T01:43:48Z"
    node: kube-ovn-worker
    reason: InitOVSBridgeSucceeded
    status: "True"
    type: Ready
  ready: true
  readyNodes:
  - kube-ovn-control-plane
  - kube-ovn-worker
  vlans:
  - vlan-195999955
(v) root@ae86:~# k get subnet external -o yaml
apiVersion: kubeovn.io/v1
kind: Subnet
metadata:
  creationTimestamp: "2024-06-22T01:43:48Z"
  finalizers:
  - kubeovn.io/kube-ovn-controller
  generation: 2
  name: external
  resourceVersion: "4122"
  uid: c123bad5-05b5-488d-a441-a8ce3e3552e9
spec:
  cidrBlock: 172.19.0.0/16
  default: false
  enableLb: true
  excludeIps:
  - 172.19.0.1
  - 172.19.0.2
  - 172.19.0.3
  gateway: 172.19.0.1
  gatewayNode: ""
  gatewayType: distributed
  natOutgoing: false
  private: false
  protocol: IPv4
  provider: ovn
  vlan: vlan-195999955
  vpc: ovn-cluster
status:
  activateGateway: ""
  conditions:
  - lastTransitionTime: "2024-06-22T01:43:48Z"
    lastUpdateTime: "2024-06-22T01:46:02Z"
    reason: ResetLogicalSwitchAclSuccess
    status: "True"
    type: Validated
  - lastTransitionTime: "2024-06-22T01:43:49Z"
    lastUpdateTime: "2024-06-22T01:43:49Z"
    reason: ResetLogicalSwitchAclSuccess
    status: "True"
    type: Ready
  - lastTransitionTime: "2024-06-22T01:43:49Z"
    lastUpdateTime: "2024-06-22T01:43:49Z"
    message: Not Observed
    reason: Init
    status: Unknown
    type: Error
  dhcpV4OptionsUUID: ""
  dhcpV6OptionsUUID: ""
  natOutgoingPolicyRules: []
  u2oInterconnectionIP: ""
  u2oInterconnectionMAC: ""
  u2oInterconnectionVPC: ""
  v4availableIPrange: 172.19.0.4,172.19.0.8-172.19.255.254
  v4availableIPs: 65528
  v4usingIPrange: 172.19.0.5-172.19.0.7
  v4usingIPs: 3
  v6availableIPrange: ""
  v6availableIPs: 0
  v6usingIPrange: ""
  v6usingIPs: 0

(v) root@ae86:~# k get vlan vlan-195999955 -o yaml
apiVersion: kubeovn.io/v1
kind: Vlan
metadata:
  creationTimestamp: "2024-06-22T01:43:48Z"
  generation: 1
  name: vlan-195999955
  resourceVersion: "2183"
  uid: 96dfca0e-0669-4f0d-8a81-da0843c0c796
spec:
  id: 0
  provider: external
status:
  subnets:
  - external
(v) root@ae86:~# 

(v) root@ae86:~# k get ofip
NAME                                   VPC                    V4EIP        V6EIP   V4IP          V6IP   READY   IPTYPE   IPNAME
fip-pod-192408216                      no-bfd-vpc-103132165   172.19.0.7           192.168.0.4          true             fip-pod-192408216.ovn-vpc-nat-gw-529
shared-eip-fip-should-fail-107345702                                                                            vip      shared-vip-129997953
shared-eip-fip-should-ok-196720677     no-bfd-vpc-103132165   172.19.0.5           192.168.0.5          true    vip      shared-vip-129997953
(v) root@ae86:~# k get ip | grep 192.168.0.4
fip-pod-192408216.ovn-vpc-nat-gw-529               192.168.0.4          66:44:ef:66:97:74   kube-ovn-worker          no-bfd-subnet-199358593
(v) root@ae86:~# k get po -A -o wide | grep 172.19.0.7
(v) root@ae86:~# k get po -A -o wide | grep 192.168.0.4
ovn-vpc-nat-gw-529   fip-pod-192408216                                1/1     Running   0          3m51s   192.168.0.4   kube-ovn-worker          <none>           <none>
(v) root@ae86:~# 
(v) root@ae86:~# ping 172.19.0.7
PING 172.19.0.7 (172.19.0.7) 56(84) bytes of data.
64 bytes from 172.19.0.7: icmp_seq=1 ttl=63 time=4.74 ms
64 bytes from 172.19.0.7: icmp_seq=2 ttl=63 time=0.714 ms
^C
--- 172.19.0.7 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1002ms
rtt min/avg/max/mdev = 0.714/2.726/4.739/2.012 ms

(v) root@ae86:~# ip -br a | grep -C 2 172.19
br-09b2e217f827  UP             172.20.0.1/16 fc00:5645:6976:1737::1/64 fe80::42:bbff:fe83:71ef/64 fe80::1/64 
br-4d767ea66623  UP             172.18.0.1/16 fc00:f853:ccd:e793::1/64 fe80::42:34ff:fe81:6180/64 fe80::1/64 
br-7b69ce9ed697  UP             172.19.0.1/16 fc00:adb1:b29b:608d::1/64 fe80::42:4aff:fe87:7f5c/64 fe80::1/64 
docker0          DOWN           172.17.0.1/16 fe80::42:b3ff:fe1a:2bd4/64 
veth79dfb86@if21 UP             fe80::a824:3fff:fe15:c182/64 
bobz965 commented 5 months ago

目前 e2e 的结果看来,ofip 172.19.0.7 是可以 ping 通的

如果查到是文档哪里有问题,帮忙贴出来下。

yeshl commented 5 months ago

当我apply这个 ovn-external-gw-config后 enable-external-gw: "true",查看vpc

root@master20:~# kubectl get vpc
NAME          ENABLEEXTERNAL   ENABLEBFD   STANDBY   SUBNETS                             EXTRAEXTERNALSUBNETS   NAMESPACES
ovn-cluster   false            false       true      ["external","join","ovn-default"]  
                       ^这里是不是应该变成true才对,但它还是false
bobz965 commented 5 months ago

可以看下 kube-ovn-controller 的 pod log 有没有和这个 vpc 名字相关的 ERR log

yeshl commented 5 months ago

通过kubectl -n kube-system rollout restart deploy/kube-ovn-controller可以变成true,可能是我测试太多次,数据有问题,没有增删干净。。请问有没有类似reset的命令,将数据恢复到初始安装的状态,经常发现通过yaml删除了资源对象,但是controller还是一直报告已经删除的对象的错误日志

yeshl commented 5 months ago

E0622 17:30:42.776069 7 vpc.go:547] failed to add default external connection for vpc vpc-1, error no external gw nodes I0622 17:30:42.776082 7 subnet.go:338] format subnet subnet-1, changed false E0622 17:30:42.776095 7 vpc.go:995] error syncing 'vpc-1': no external gw nodes, requeuing

bobz965 commented 5 months ago

---
apiVersion: v1
kind: ConfigMap
metadata:
  name: ovn-external-gw-config
  namespace: kube-system
data:
  enable-external-gw: "true"
  external-gw-nodes: "node51.host"
  type: "centralized"
  external-gw-nic: "eno4" # 用于接入 ovs 公网网桥的网卡
  external-gw-addr: "10.5.204.254/24" # underlay 物理网关的 ip

#   external-gw-nodes: "node51.host" 这个 配置,应该会给 node 打上一个标签:

或者可以参考这个位置:

https://kubeovn.github.io/docs/v1.13.x/advance/ovn-eip-fip-snat/#31-ovn-snat-subnet-cidr

image


# 首先通过添加标签指定 external-gw-nodes
kubectl label nodes pc-node-1 pc-node-2 pc-node-3 ovn.kubernetes.io/external-gw=true

这个 ovn eip 的功能可能 1.12 支持的不好,1.13 改动较大。建议使用1.13 或者 1.12-mc。

无法回合

yeshl commented 5 months ago

已经升级到1.13.0问题:默认vpc建立的pod中ping 1.1.1.1走的是主机nat,没有走oeip-ofip

router efd15288-2896-4232-a33c-229c4fe53189 (ovn-cluster)
    port ovn-cluster-join
        mac: "22:46:0c:bb:64:fe"
        networks: ["100.64.0.1/16"]
    port ovn-cluster-external
        mac: "5a:b0:14:62:ca:24"
        networks: ["112.5.140.254/24"]
        gateway chassis: [70fb0cad-67d6-4beb-8f23-06abfe27c268]
    port ovn-cluster-ovn-default
        mac: "4e:2e:3d:92:aa:3d"
        networks: ["10.16.0.1/16"]
    nat 33f54c38-0aa9-4eaf-a7b8-93261f9fb469
        external ip: "112.5.140.40"
        logical ip: "10.16.0.10"
        type: "dnat_and_snat"
# kubectl ko nbctl show ovn-cluster
root@master20:~# kubectl get vpc
NAME          ENABLEEXTERNAL   ENABLEBFD   STANDBY   SUBNETS                             EXTRAEXTERNALSUBNETS   NAMESPACES
ovn-cluster   true             false       true      ["join","ovn-default","external"]                          
root@master20:~# kubectl get subnet
NAME          PROVIDER   VPC           VLAN   PROTOCOL   CIDR             PRIVATE   NAT     DEFAULT   GATEWAYTYPE   V4USED   V4AVAILABLE   V6USED   V6AVAILABLE   EXCLUDEIPS                                                     U2OINTERCONNECTIONIP
external      ovn        ovn-cluster          IPv4       112.5.140.0/24   false     false   false     distributed   0        169           0        0             ["112.5.140.1..112.5.140.30","112.5.140.200..112.5.140.255"]   
join          ovn        ovn-cluster          IPv4       100.64.0.0/16    false     false   false     distributed   11       65522         0        0             ["100.64.0.1"]                                                 
ovn-default   ovn        ovn-cluster          IPv4       10.16.0.0/16     false     true    true      distributed   159      65374         0        0             ["10.16.0.1"]                                                  
root@master20:~# kubectl get oeip  
NAME         V4IP           V6IP   MAC                 TYPE   NAT   READY   EXTERNALSUBNET
eip-static   112.5.140.40          1e:1c:f8:b3:f1:a4   nat    fip   true    external
root@master20:~# kubectl get ofip
NAME         VPC           V4EIP          V6EIP   V4IP         V6IP   READY   IPTYPE   IPNAME
fip-static   ovn-cluster   112.5.140.40           10.16.0.10          true             pod-ex.dev
root@master20:~# kubectl get po -A -owide |grep pod-
dev                 pod-ex                                                 1/1     Running   0               12m     10.16.0.10    master21.host   <none>           <none>
root@master20:~# ping 112.5.140.40
PING 112.5.140.40 (112.5.140.40) 56(84) bytes of data.
^C
--- 112.5.140.40 ping statistics ---
5 packets transmitted, 0 received, 100% packet loss, time 4083ms
yeshl commented 5 months ago

通过在外网ip对端抓包发现,自建vpc当pod未绑定oeip-ofip的时候,pod可以将包通过genev_sys_6081到网关节点并通过br-external再从外网网卡发出,但是因为此时没有snat,所以包回不来,但是加上oeip-ofip之后,pod就不能发出包了,不会走genev_sys_6081

zcq98 commented 5 months ago

image 你的fip不是103吗? 为啥ping 101

yeshl commented 5 months ago

ofip的确怎么弄都不行,后续对OvnSnatRule 跟OvnDnatRule 做了测试,很顺利!!文档发现一处错误 还有一个就是,资源修改很多都不行,要删除再重建,有时候还需要 restart deploy/kube-ovn-controller xx

yeshl commented 5 months ago

ofip问题的解决过程:

1.ofip应该是只能用分布式网关出外网,自建vpc的subnet中: gatewayType: distributed pod绑定ofip时,当pod调度到外网卡主机的节点网络可通,当pod不在外网卡主机则不通 使用ofip发现多了: EXTERNAL_MAC LOGICAL_PORT的值 TYPE GATEWAY_PORT EXTERNAL_IP EXTERNAL_PORT LOGICAL_IP EXTERNAL_MAC LOGICAL_PORT dnat_and_snat 192.168.40.32 192.168.41.2 8e:75:e4:d8:e7:96 pod-1.dev 注:配置 LOGICAL_PORT EXTERNAL_MAC 实现分布式EIP功能,相关流表会在LOGICAL_IP/LOGICAL_PORT 所在的计算节点下发,实现流量本地收发而不需要到集中式网关上。不配置则为集中式网关,到lrp-set-gateway-chassis所在节点公网出口。

2.当默认vpc通过ConfigMap启用enable-external-gw指定type: "centrailized"是否与自建vpc的subnet中: gatewayType: distributed产生冲突 因为其连接的外网external是同一个

3.自建vpc的时候,在lr上缺少默认路由0.0.0.0/0 需要手动添加 kubectl ko nbctl --may-exist lr-route-add vpc1 0.0.0.0/0 192.168.40.1 #官方说会自动添加,实际缺少这一条路由

#命令参考:
kubectl ko nbctl --may-exist lr-route-add vpc1 0.0.0.0/0 192.168.40.1  #官方说会自动添加,实际缺少这一条路由
kubectl ko nbctl lr-route-del vpc1 0.0.0.0/0 192.168.40.1 

#snat可以添加当个ip或cidr,删除时候指定后面LOGICAL_IP
kubectl ko nbctl lr-nat-add vpc1 snat 192.168.40.32 192.168.41.0/24
kubectl ko nbctl lr-nat-del vpc1 snat 192.168.41.0/24
# kubectl ko nbctl lr-nat-list vpc1
TYPE             GATEWAY_PORT          EXTERNAL_IP        EXTERNAL_PORT    LOGICAL_IP          EXTERNAL_MAC         LOGICAL_PORT
snat                                   192.168.40.32                       192.168.41.0/24

#dnat_and_snat可以添加当个ip,删除时候指定前面EXTERNAL_IP
kubectl ko nbctl lr-nat-add vpc1 dnat_and_snat 192.168.40.34 192.168.41.2 pod-1.dev  26:8d:22:5a:5f:cb
kubectl ko nbctl lr-nat-add vpc1 dnat_and_snat 192.168.40.34 192.168.41.2
kubectl ko nbctl lr-nat-del vpc1 dnat_and_snat 192.168.40.34 

xx

oilbeater commented 5 months ago

ofip的确怎么弄都不行,后续对OvnSnatRule 跟OvnDnatRule 做了测试,很顺利!!文档发现一处错误 还有一个就是,资源修改很多都不行,要删除再重建,有时候还需要 restart deploy/kube-ovn-controller

@yeshl 具体是哪些资源需要重建和 restart,我们集中看一下

yeshl commented 5 months ago

比如 1.配置ovn-external-gw-config时候,需要重启kube-ovn-controller 才能通过kubectl get vpc查看ENABLEEXTERNAL变成true 2.当ovn-external-gw-config删除时,并不会删除网关节点上的br-external网桥,需要手动删除ovs-vsctl del-br br-external ENABLEEXTERNAL状态也需要重启kube-ovn-controller 其它情况在创建/删除subnet,有时也需要重启kube-ovn-controller ,subnet会删除不了

github-actions[bot] commented 3 months ago

Issues go stale after 60d of inactivity. Please comment or re-open the issue if you are still interested in getting this issue fixed.

github-actions[bot] commented 1 month ago

Issues go stale after 60d of inactivity. Please comment or re-open the issue if you are still interested in getting this issue fixed.