Open yeshl opened 5 months ago
看起来配置都是正常的,你的抓包的图可以附带一下呗?
看到下面多了路由,我bond0上ip是:10.0.3.20 10.5.204.0/24 via 100.64.0.1 dev ovn0 src 10.0.3.20
我把 e2e 的信息拿出来给你先参考下,看是否能解决。
e2e 参考方式
e2e 细节
参考该文档构建e2e 环境: https://kubeovn.github.io/docs/stable/reference/dev-env/ 我的 e2e 运行在一台 ubuntu 24 pc 上:
执行如下命令运行 e2e:
make kind-init
make kind-install
make ovn-vpc-nat-gw-conformance-e2e
(v) root@ae86:~# k get cm -n kube-system ovn-external-gw-config -o yaml
apiVersion: v1
data:
enable-external-gw: "true"
external-gw-addr: 172.19.0.0/16
external-gw-nic: eth1
external-gw-nodes: kube-ovn-worker,kube-ovn-control-plane
type: centralized
kind: ConfigMap
metadata:
creationTimestamp: "2024-06-22T01:44:00Z"
name: ovn-external-gw-config
namespace: kube-system
resourceVersion: "2218"
uid: c23104d8-c045-42ae-a6fc-8ba1a510f2ca
(v) root@ae86:~# k get vpc
NAME ENABLEEXTERNAL ENABLEBFD STANDBY SUBNETS EXTRAEXTERNALSUBNETS NAMESPACES
no-bfd-vpc-103132165 true false true ["no-bfd-subnet-199358593","no-bfd-extra-subnet-100037991"] ["extra"]
ovn-cluster true false true ["join","ovn-default","external","extra"]
(v) root@ae86:~#
(v) root@ae86:~#
(v) root@ae86:~# k get subnet
NAME PROVIDER VPC VLAN PROTOCOL CIDR PRIVATE NAT DEFAULT GATEWAYTYPE V4USED V4AVAILABLE V6USED V6AVAILABLE EXCLUDEIPS U2OINTERCONNECTIONIP
external ovn ovn-cluster vlan-195999955 IPv4 172.19.0.0/16 false false false distributed 3 65528 0 0 ["172.19.0.1","172.19.0.2","172.19.0.3"]
extra ovn ovn-cluster vlan-extra-101608684 IPv4 172.20.0.0/16 false false false distributed 2 65529 0 0 ["172.20.0.1","172.20.0.2","172.20.0.3"]
join ovn ovn-cluster IPv4 100.64.0.0/16 false false false distributed 2 65531 0 0 ["100.64.0.1"]
no-bfd-extra-subnet-100037991 ovn no-bfd-vpc-103132165 IPv4 192.168.3.0/24 false false false distributed 3 250 0 0 ["192.168.3.1"]
no-bfd-subnet-199358593 ovn no-bfd-vpc-103132165 IPv4 192.168.0.0/24 false false false distributed 4 249 0 0 ["192.168.0.1"]
ovn-default ovn ovn-cluster IPv4 10.16.0.0/16 false true true distributed 4 65529 0 0 ["10.16.0.1"]
(v) root@ae86:~#
(v) root@ae86:~#
(v) root@ae86:~# k get vpc no-bfd-vpc-103132165 -o yaml
apiVersion: kubeovn.io/v1
kind: Vpc
metadata:
creationTimestamp: "2024-06-22T01:44:00Z"
generation: 2
name: no-bfd-vpc-103132165
resourceVersion: "2836"
uid: dce2a257-0929-40cb-b98a-136eb6f53c1f
spec:
enableExternal: true
extraExternalSubnets:
- extra
staticRoutes:
- bfdId: ""
cidr: 192.168.3.0/24
ecmpMode: ""
nextHopIP: 172.20.0.1
policy: policySrc
routeTable: ""
status:
default: false
defaultLogicalSwitch: ""
enableBfd: false
enableExternal: true
extraExternalSubnets:
- extra
router: no-bfd-vpc-103132165
sctpLoadBalancer: vpc-no-bfd-vpc-103132165-sctp-load
sctpSessionLoadBalancer: vpc-no-bfd-vpc-103132165-sctp-sess-load
standby: true
subnets:
- no-bfd-subnet-199358593
- no-bfd-extra-subnet-100037991
tcpLoadBalancer: vpc-no-bfd-vpc-103132165-tcp-load
tcpSessionLoadBalancer: vpc-no-bfd-vpc-103132165-tcp-sess-load
udpLoadBalancer: vpc-no-bfd-vpc-103132165-udp-load
udpSessionLoadBalancer: vpc-no-bfd-vpc-103132165-udp-sess-load
(v) root@ae86:~#
(v) root@ae86:~# k get ofip
NAME VPC V4EIP V6EIP V4IP V6IP READY IPTYPE IPNAME
fip-extra-pod-174516449 no-bfd-vpc-103132165 172.20.0.5 192.168.3.4 true fip-extra-pod-174516449.ovn-vpc-nat-gw-529
fip-pod-192408216 no-bfd-vpc-103132165 172.19.0.7 192.168.0.4 true fip-pod-192408216.ovn-vpc-nat-gw-529
shared-eip-fip-should-fail-107345702 vip shared-vip-129997953
shared-eip-fip-should-ok-196720677 no-bfd-vpc-103132165 172.19.0.5 192.168.0.5 true vip shared-vip-129997953
(v) root@ae86:~# k get subnet no-bfd-subnet-199358593 -o yaml
apiVersion: kubeovn.io/v1
kind: Subnet
metadata:
creationTimestamp: "2024-06-22T01:44:02Z"
finalizers:
- kubeovn.io/kube-ovn-controller
generation: 2
name: no-bfd-subnet-199358593
resourceVersion: "2844"
uid: ec41ad4a-1e07-4208-9c23-324a88413b99
spec:
cidrBlock: 192.168.0.0/24
default: false
enableLb: true
excludeIps:
- 192.168.0.1
gateway: 192.168.0.1
gatewayNode: ""
gatewayType: distributed
natOutgoing: false
private: false
protocol: IPv4
provider: ovn
vpc: no-bfd-vpc-103132165
status:
activateGateway: ""
conditions:
- lastTransitionTime: "2024-06-22T01:44:02Z"
lastUpdateTime: "2024-06-22T01:45:53Z"
reason: ResetLogicalSwitchAclSuccess
status: "True"
type: Validated
- lastTransitionTime: "2024-06-22T01:44:04Z"
lastUpdateTime: "2024-06-22T01:44:04Z"
reason: ResetLogicalSwitchAclSuccess
status: "True"
type: Ready
- lastTransitionTime: "2024-06-22T01:44:04Z"
lastUpdateTime: "2024-06-22T01:44:04Z"
message: Not Observed
reason: Init
status: Unknown
type: Error
dhcpV4OptionsUUID: ""
dhcpV6OptionsUUID: ""
natOutgoingPolicyRules: []
u2oInterconnectionIP: ""
u2oInterconnectionMAC: ""
u2oInterconnectionVPC: ""
v4availableIPrange: 192.168.0.6-192.168.0.254
v4availableIPs: 249
v4usingIPrange: 192.168.0.2-192.168.0.5
v4usingIPs: 4
v6availableIPrange: ""
v6availableIPs: 0
v6usingIPrange: ""
v6usingIPs: 0
(v) root@ae86:~#
(v) root@ae86:~# k get provider-networks external -o yaml
apiVersion: kubeovn.io/v1
kind: ProviderNetwork
metadata:
creationTimestamp: "2024-06-22T01:43:36Z"
generation: 1
name: external
resourceVersion: "2175"
uid: 7b834dfa-32b6-4665-8255-04ee844158b4
spec:
defaultInterface: eth1
status:
conditions:
- lastTransitionTime: "2024-06-22T01:43:48Z"
lastUpdateTime: "2024-06-22T01:43:48Z"
node: kube-ovn-control-plane
reason: InitOVSBridgeSucceeded
status: "True"
type: Ready
- lastTransitionTime: "2024-06-22T01:43:48Z"
lastUpdateTime: "2024-06-22T01:43:48Z"
node: kube-ovn-worker
reason: InitOVSBridgeSucceeded
status: "True"
type: Ready
ready: true
readyNodes:
- kube-ovn-control-plane
- kube-ovn-worker
vlans:
- vlan-195999955
(v) root@ae86:~# k get subnet external -o yaml
apiVersion: kubeovn.io/v1
kind: Subnet
metadata:
creationTimestamp: "2024-06-22T01:43:48Z"
finalizers:
- kubeovn.io/kube-ovn-controller
generation: 2
name: external
resourceVersion: "4122"
uid: c123bad5-05b5-488d-a441-a8ce3e3552e9
spec:
cidrBlock: 172.19.0.0/16
default: false
enableLb: true
excludeIps:
- 172.19.0.1
- 172.19.0.2
- 172.19.0.3
gateway: 172.19.0.1
gatewayNode: ""
gatewayType: distributed
natOutgoing: false
private: false
protocol: IPv4
provider: ovn
vlan: vlan-195999955
vpc: ovn-cluster
status:
activateGateway: ""
conditions:
- lastTransitionTime: "2024-06-22T01:43:48Z"
lastUpdateTime: "2024-06-22T01:46:02Z"
reason: ResetLogicalSwitchAclSuccess
status: "True"
type: Validated
- lastTransitionTime: "2024-06-22T01:43:49Z"
lastUpdateTime: "2024-06-22T01:43:49Z"
reason: ResetLogicalSwitchAclSuccess
status: "True"
type: Ready
- lastTransitionTime: "2024-06-22T01:43:49Z"
lastUpdateTime: "2024-06-22T01:43:49Z"
message: Not Observed
reason: Init
status: Unknown
type: Error
dhcpV4OptionsUUID: ""
dhcpV6OptionsUUID: ""
natOutgoingPolicyRules: []
u2oInterconnectionIP: ""
u2oInterconnectionMAC: ""
u2oInterconnectionVPC: ""
v4availableIPrange: 172.19.0.4,172.19.0.8-172.19.255.254
v4availableIPs: 65528
v4usingIPrange: 172.19.0.5-172.19.0.7
v4usingIPs: 3
v6availableIPrange: ""
v6availableIPs: 0
v6usingIPrange: ""
v6usingIPs: 0
(v) root@ae86:~# k get vlan vlan-195999955 -o yaml
apiVersion: kubeovn.io/v1
kind: Vlan
metadata:
creationTimestamp: "2024-06-22T01:43:48Z"
generation: 1
name: vlan-195999955
resourceVersion: "2183"
uid: 96dfca0e-0669-4f0d-8a81-da0843c0c796
spec:
id: 0
provider: external
status:
subnets:
- external
(v) root@ae86:~#
(v) root@ae86:~# k get ofip
NAME VPC V4EIP V6EIP V4IP V6IP READY IPTYPE IPNAME
fip-pod-192408216 no-bfd-vpc-103132165 172.19.0.7 192.168.0.4 true fip-pod-192408216.ovn-vpc-nat-gw-529
shared-eip-fip-should-fail-107345702 vip shared-vip-129997953
shared-eip-fip-should-ok-196720677 no-bfd-vpc-103132165 172.19.0.5 192.168.0.5 true vip shared-vip-129997953
(v) root@ae86:~# k get ip | grep 192.168.0.4
fip-pod-192408216.ovn-vpc-nat-gw-529 192.168.0.4 66:44:ef:66:97:74 kube-ovn-worker no-bfd-subnet-199358593
(v) root@ae86:~# k get po -A -o wide | grep 172.19.0.7
(v) root@ae86:~# k get po -A -o wide | grep 192.168.0.4
ovn-vpc-nat-gw-529 fip-pod-192408216 1/1 Running 0 3m51s 192.168.0.4 kube-ovn-worker <none> <none>
(v) root@ae86:~#
(v) root@ae86:~# ping 172.19.0.7
PING 172.19.0.7 (172.19.0.7) 56(84) bytes of data.
64 bytes from 172.19.0.7: icmp_seq=1 ttl=63 time=4.74 ms
64 bytes from 172.19.0.7: icmp_seq=2 ttl=63 time=0.714 ms
^C
--- 172.19.0.7 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1002ms
rtt min/avg/max/mdev = 0.714/2.726/4.739/2.012 ms
(v) root@ae86:~# ip -br a | grep -C 2 172.19
br-09b2e217f827 UP 172.20.0.1/16 fc00:5645:6976:1737::1/64 fe80::42:bbff:fe83:71ef/64 fe80::1/64
br-4d767ea66623 UP 172.18.0.1/16 fc00:f853:ccd:e793::1/64 fe80::42:34ff:fe81:6180/64 fe80::1/64
br-7b69ce9ed697 UP 172.19.0.1/16 fc00:adb1:b29b:608d::1/64 fe80::42:4aff:fe87:7f5c/64 fe80::1/64
docker0 DOWN 172.17.0.1/16 fe80::42:b3ff:fe1a:2bd4/64
veth79dfb86@if21 UP fe80::a824:3fff:fe15:c182/64
目前 e2e 的结果看来,ofip 172.19.0.7
是可以 ping 通的
如果查到是文档哪里有问题,帮忙贴出来下。
当我apply这个 ovn-external-gw-config后 enable-external-gw: "true",查看vpc
root@master20:~# kubectl get vpc
NAME ENABLEEXTERNAL ENABLEBFD STANDBY SUBNETS EXTRAEXTERNALSUBNETS NAMESPACES
ovn-cluster false false true ["external","join","ovn-default"]
^这里是不是应该变成true才对,但它还是false
可以看下 kube-ovn-controller 的 pod log 有没有和这个 vpc 名字相关的 ERR log
通过kubectl -n kube-system rollout restart deploy/kube-ovn-controller可以变成true,可能是我测试太多次,数据有问题,没有增删干净。。请问有没有类似reset的命令,将数据恢复到初始安装的状态,经常发现通过yaml删除了资源对象,但是controller还是一直报告已经删除的对象的错误日志
E0622 17:30:42.776069 7 vpc.go:547] failed to add default external connection for vpc vpc-1, error no external gw nodes I0622 17:30:42.776082 7 subnet.go:338] format subnet subnet-1, changed false E0622 17:30:42.776095 7 vpc.go:995] error syncing 'vpc-1': no external gw nodes, requeuing
---
apiVersion: v1
kind: ConfigMap
metadata:
name: ovn-external-gw-config
namespace: kube-system
data:
enable-external-gw: "true"
external-gw-nodes: "node51.host"
type: "centralized"
external-gw-nic: "eno4" # 用于接入 ovs 公网网桥的网卡
external-gw-addr: "10.5.204.254/24" # underlay 物理网关的 ip
# external-gw-nodes: "node51.host" 这个 配置,应该会给 node 打上一个标签:
或者可以参考这个位置:
https://kubeovn.github.io/docs/v1.13.x/advance/ovn-eip-fip-snat/#31-ovn-snat-subnet-cidr
# 首先通过添加标签指定 external-gw-nodes
kubectl label nodes pc-node-1 pc-node-2 pc-node-3 ovn.kubernetes.io/external-gw=true
这个 ovn eip 的功能可能 1.12 支持的不好,1.13 改动较大。建议使用1.13 或者 1.12-mc。
无法回合
已经升级到1.13.0问题:默认vpc建立的pod中ping 1.1.1.1走的是主机nat,没有走oeip-ofip
router efd15288-2896-4232-a33c-229c4fe53189 (ovn-cluster)
port ovn-cluster-join
mac: "22:46:0c:bb:64:fe"
networks: ["100.64.0.1/16"]
port ovn-cluster-external
mac: "5a:b0:14:62:ca:24"
networks: ["112.5.140.254/24"]
gateway chassis: [70fb0cad-67d6-4beb-8f23-06abfe27c268]
port ovn-cluster-ovn-default
mac: "4e:2e:3d:92:aa:3d"
networks: ["10.16.0.1/16"]
nat 33f54c38-0aa9-4eaf-a7b8-93261f9fb469
external ip: "112.5.140.40"
logical ip: "10.16.0.10"
type: "dnat_and_snat"
# kubectl ko nbctl show ovn-cluster
root@master20:~# kubectl get vpc
NAME ENABLEEXTERNAL ENABLEBFD STANDBY SUBNETS EXTRAEXTERNALSUBNETS NAMESPACES
ovn-cluster true false true ["join","ovn-default","external"]
root@master20:~# kubectl get subnet
NAME PROVIDER VPC VLAN PROTOCOL CIDR PRIVATE NAT DEFAULT GATEWAYTYPE V4USED V4AVAILABLE V6USED V6AVAILABLE EXCLUDEIPS U2OINTERCONNECTIONIP
external ovn ovn-cluster IPv4 112.5.140.0/24 false false false distributed 0 169 0 0 ["112.5.140.1..112.5.140.30","112.5.140.200..112.5.140.255"]
join ovn ovn-cluster IPv4 100.64.0.0/16 false false false distributed 11 65522 0 0 ["100.64.0.1"]
ovn-default ovn ovn-cluster IPv4 10.16.0.0/16 false true true distributed 159 65374 0 0 ["10.16.0.1"]
root@master20:~# kubectl get oeip
NAME V4IP V6IP MAC TYPE NAT READY EXTERNALSUBNET
eip-static 112.5.140.40 1e:1c:f8:b3:f1:a4 nat fip true external
root@master20:~# kubectl get ofip
NAME VPC V4EIP V6EIP V4IP V6IP READY IPTYPE IPNAME
fip-static ovn-cluster 112.5.140.40 10.16.0.10 true pod-ex.dev
root@master20:~# kubectl get po -A -owide |grep pod-
dev pod-ex 1/1 Running 0 12m 10.16.0.10 master21.host <none> <none>
root@master20:~# ping 112.5.140.40
PING 112.5.140.40 (112.5.140.40) 56(84) bytes of data.
^C
--- 112.5.140.40 ping statistics ---
5 packets transmitted, 0 received, 100% packet loss, time 4083ms
通过在外网ip对端抓包发现,自建vpc当pod未绑定oeip-ofip的时候,pod可以将包通过genev_sys_6081到网关节点并通过br-external再从外网网卡发出,但是因为此时没有snat,所以包回不来,但是加上oeip-ofip之后,pod就不能发出包了,不会走genev_sys_6081
你的fip不是103吗? 为啥ping 101
ofip的确怎么弄都不行,后续对OvnSnatRule 跟OvnDnatRule 做了测试,很顺利!!文档发现一处错误 还有一个就是,资源修改很多都不行,要删除再重建,有时候还需要 restart deploy/kube-ovn-controller
1.ofip应该是只能用分布式网关出外网,自建vpc的subnet中: gatewayType: distributed pod绑定ofip时,当pod调度到外网卡主机的节点网络可通,当pod不在外网卡主机则不通 使用ofip发现多了: EXTERNAL_MAC LOGICAL_PORT的值 TYPE GATEWAY_PORT EXTERNAL_IP EXTERNAL_PORT LOGICAL_IP EXTERNAL_MAC LOGICAL_PORT dnat_and_snat 192.168.40.32 192.168.41.2 8e:75:e4:d8:e7:96 pod-1.dev 注:配置 LOGICAL_PORT EXTERNAL_MAC 实现分布式EIP功能,相关流表会在LOGICAL_IP/LOGICAL_PORT 所在的计算节点下发,实现流量本地收发而不需要到集中式网关上。不配置则为集中式网关,到lrp-set-gateway-chassis所在节点公网出口。
2.当默认vpc通过ConfigMap启用enable-external-gw指定type: "centrailized"是否与自建vpc的subnet中: gatewayType: distributed产生冲突 因为其连接的外网external是同一个
#命令参考:
kubectl ko nbctl --may-exist lr-route-add vpc1 0.0.0.0/0 192.168.40.1 #官方说会自动添加,实际缺少这一条路由
kubectl ko nbctl lr-route-del vpc1 0.0.0.0/0 192.168.40.1
#snat可以添加当个ip或cidr,删除时候指定后面LOGICAL_IP
kubectl ko nbctl lr-nat-add vpc1 snat 192.168.40.32 192.168.41.0/24
kubectl ko nbctl lr-nat-del vpc1 snat 192.168.41.0/24
# kubectl ko nbctl lr-nat-list vpc1
TYPE GATEWAY_PORT EXTERNAL_IP EXTERNAL_PORT LOGICAL_IP EXTERNAL_MAC LOGICAL_PORT
snat 192.168.40.32 192.168.41.0/24
#dnat_and_snat可以添加当个ip,删除时候指定前面EXTERNAL_IP
kubectl ko nbctl lr-nat-add vpc1 dnat_and_snat 192.168.40.34 192.168.41.2 pod-1.dev 26:8d:22:5a:5f:cb
kubectl ko nbctl lr-nat-add vpc1 dnat_and_snat 192.168.40.34 192.168.41.2
kubectl ko nbctl lr-nat-del vpc1 dnat_and_snat 192.168.40.34
ofip的确怎么弄都不行,后续对OvnSnatRule 跟OvnDnatRule 做了测试,很顺利!!文档发现一处错误 还有一个就是,资源修改很多都不行,要删除再重建,有时候还需要 restart deploy/kube-ovn-controller
@yeshl 具体是哪些资源需要重建和 restart,我们集中看一下
比如 1.配置ovn-external-gw-config时候,需要重启kube-ovn-controller 才能通过kubectl get vpc查看ENABLEEXTERNAL变成true 2.当ovn-external-gw-config删除时,并不会删除网关节点上的br-external网桥,需要手动删除ovs-vsctl del-br br-external ENABLEEXTERNAL状态也需要重启kube-ovn-controller 其它情况在创建/删除subnet,有时也需要重启kube-ovn-controller ,subnet会删除不了
Issues go stale after 60d of inactivity. Please comment or re-open the issue if you are still interested in getting this issue fixed.
Issues go stale after 60d of inactivity. Please comment or re-open the issue if you are still interested in getting this issue fixed.
Kube-OVN Version
1.12.16
Kubernetes Version
1.30
Operation-system/Kernel Version
debian12
Description
外网网关配置,无法ping通EIP,步骤按文档配置:https://kubeovn.github.io/docs/stable/advance/ovn-eip-fip-snat/#31-ovn-snat-subnet-cidr
Steps To Reproduce
Current Behavior
外网网关配置,无法ping通EIP
Expected Behavior
ping pod 的公网 ip 是能通