kubeovn / kube-ovn

A Bridge between SDN and Cloud Native (Project under CNCF)
https://kubeovn.github.io/docs/stable/en/
Apache License 2.0
1.93k stars 438 forks source link

TCP connection failed in Rocky Linux 8.6 #1647

Closed gugulee closed 2 years ago

gugulee commented 2 years ago

Expected Behavior

同一台宿主机 POD 之间 TCP 应该是通的。

Actual Behavior

同一台宿主机 POD 之间 TCP 不通

Steps to Reproduce the Problem

  1. 在 vpc: ovn-cluster 的 subnet:ovn-default 下创建两个 POD,POD yaml如下 ` apiVersion: apps/v1 kind: Deployment metadata: name: deploy spec: selector: matchLabels: app: deploy replicas: 2 template: metadata: labels: app: deploy annotations: ovn.kubernetes.io/default_route: "true" ovn.kubernetes.io/logical_switch: ovn-default spec: nodeSelector: kubernetes.io/hostname: XXX containers:
    • name: centos image: centos:7 command: ["bash","-c","sleep 365d"] imagePullPolicy: Always tolerations:
      • key: key value: value effect: NoSchedule `
  2. 在其中一个 POD(POD-1) 中启动 tcp server,命令为:nc -l -t 12345
  3. 在另一个 POD(POD-2) 中启动 tcp client,命令为:ncat 172.10.0.97 12345
  4. 发现 tcp client 和 tcp server 无法联通。
  5. 在宿主机抓 POD-2 veth 流量如下图 image

根据抓包分析,tcp 三次握手正常进行,但是当 tcp client 发了一个包后,tcp server 换了一个端口 5511 (本应该是 12345)来回包??

结果 tcp 链接断掉。


经过测试:

Additional Info

Ubbo-Sathla commented 2 years ago

kubeovn1.10.2 kubeovn1.10.3 存在相同 问题, 求解决

oilbeater commented 2 years ago

可不可以用 kubectl ko trace 打印一下逻辑流表看看是不是有问题

gugulee commented 2 years ago

暂时没有时间去做这个事情

gugulee commented 2 years ago

不过这个场景很容易复现,在 Rocky Linux 8.6 (Green Obsidian) 中是必现的。

Ubbo-Sathla commented 2 years ago

image image

image

Ubbo-Sathla commented 2 years ago

image

Ubbo-Sathla commented 2 years ago

我没有找到 table=8 (ls_in_acl_hint ), priority=5 , match=(!ct.trk), action=(reg0[8] = 1; reg0[9] = 1; next;) 这条规则是如何 加进去的 , 我 手动 使用ovn 添加 sw 添加 端口 是没有 这条规则的, 并且 是 tcp 是没有问题的

ovn-nbctl ls-add sw1
ovn-nbctl lsp-add sw1 sw1-port1
ovn-nbctl lsp-add sw1 sw1-port2
zhangzujian commented 2 years ago

The issue has been fixed in kernel-4.18.0-372.13.1.el8_6.x86_64, please update the kernel.

Ubbo-Sathla commented 2 years ago

add kernel prerequisite for Rocky Linux 8.6 #1713

我 测试了 内核 版本 kernel-5.14.0-70.13.1.el9_0.x86_64 无法正常 使用 升级 到 kernel-5.14.0-70.17.1.el9_0.x86_64 解决问题

[root@yaoshicheng-kubernetes-1 ~]# cat /etc/os-release
NAME="AlmaLinux"
VERSION="9.0 (Emerald Puma)"
ID="almalinux"
ID_LIKE="rhel centos fedora"
VERSION_ID="9.0"
PLATFORM_ID="platform:el9"
PRETTY_NAME="AlmaLinux 9.0 (Emerald Puma)"
ANSI_COLOR="0;34"
LOGO="fedora-logo-icon"
CPE_NAME="cpe:/o:almalinux:almalinux:9::baseos"
HOME_URL="https://almalinux.org/"
DOCUMENTATION_URL="https://wiki.almalinux.org/"
BUG_REPORT_URL="https://bugs.almalinux.org/"

ALMALINUX_MANTISBT_PROJECT="AlmaLinux-9"
ALMALINUX_MANTISBT_PROJECT_VERSION="9.0"
REDHAT_SUPPORT_PRODUCT="AlmaLinux"
REDHAT_SUPPORT_PRODUCT_VERSION="9.0"
[root@yaoshicheng-kubernetes-1 ~]#
zhangzujian commented 2 years ago

add kernel prerequisite for Rocky Linux 8.6 #1713

我 测试了 内核 版本 kernel-5.14.0-70.13.1.el9_0.x86_64 无法正常 使用 升级 到 kernel-5.14.0-70.17.1.el9_0.x86_64 解决问题

[root@yaoshicheng-kubernetes-1 ~]# cat /etc/os-release
NAME="AlmaLinux"
VERSION="9.0 (Emerald Puma)"
ID="almalinux"
ID_LIKE="rhel centos fedora"
VERSION_ID="9.0"
PLATFORM_ID="platform:el9"
PRETTY_NAME="AlmaLinux 9.0 (Emerald Puma)"
ANSI_COLOR="0;34"
LOGO="fedora-logo-icon"
CPE_NAME="cpe:/o:almalinux:almalinux:9::baseos"
HOME_URL="https://almalinux.org/"
DOCUMENTATION_URL="https://wiki.almalinux.org/"
BUG_REPORT_URL="https://bugs.almalinux.org/"

ALMALINUX_MANTISBT_PROJECT="AlmaLinux-9"
ALMALINUX_MANTISBT_PROJECT_VERSION="9.0"
REDHAT_SUPPORT_PRODUCT="AlmaLinux"
REDHAT_SUPPORT_PRODUCT_VERSION="9.0"
[root@yaoshicheng-kubernetes-1 ~]#

Thanks for the information.

gugulee commented 2 years ago

这个问题的根因是啥呢?有相关的参考文档吗?多谢

zhangzujian commented 2 years ago

这个问题的根因是啥呢?有相关的参考文档吗?多谢

没有。使用新版的 openvswitch 内核代码编译替换 4.18.0-372.9.1.el8.x86_64 自带的 openvswitch 模块,问题依然存在。内核 change log 也没有发现线索。

* Mon Jun 06 2022 Augusto Caringi <acaringi@redhat.com> [4.18.0-372.13.1.el8_6]
- openvswitch: always update flow key after nat (Aaron Conole) [2068476 2066885]
- KVM: PPC: Fix TCE handling for VFIO (Daniel Henrique Barboza) [2085572 2062687]
- rfkill: make new event layout opt-in (Jose Ignacio Tornos Martinez) [2087641 2023175]
- ASoC: Intel: soc-acpi: add entries in ADL match table (Jaroslav Kysela) [2090423 2052011]
- isert: support for unsolicited NOPIN with no response (Maurizio Lombardi) [2079433 2035915]
- iscsit: increment max_cmd_sn for isert on command release (Maurizio Lombardi) [2079433 2035915]
- net: tcp better handling of reordering then loss cases (Marcelo Ricardo Leitner) [2080972 2074566]
- tcp: tcp_mark_head_lost is only valid for sack-tcp (Marcelo Ricardo Leitner) [2080972 2074566]

* Wed Jun 01 2022 Augusto Caringi <acaringi@redhat.com> [4.18.0-372.12.1.el8_6]
- sctp: use the correct skb for security_sctp_assoc_request (Xin Long) [2070959]
- net/mlx5e: Fix wrong source vport matching on tunnel rule (Amir Tzin) [2088610]
- net/mlx5: DR, Fix missing flow_source when creating multi-destination FW table (Amir Tzin) [2088611]
- net/mlx5: DR, Fix slab-out-of-bounds in mlx5_cmd_dr_create_fte (Amir Tzin) [2088611]
- net/mlx5: DR, Cache STE shadow memory (Amir Tzin) [2075553]
- net/mlx5: DR, Fix the threshold that defines when pool sync is initiated (Amir Tzin) [2075553]
- drm/i915/display: Remove check for low voltage sku for max dp source rate (Jocelyn Falempe) [2066644]
- net/mlx5: DR, Ignore modify TTL on RX if device doesn't support it (Amir Tzin) [2088638]
- net/mlx5: Bridge, Fix devlink deadlock on net namespace deletion (Amir Tzin) [2081011]
- net/mlx5e: TC, Skip redundant ct clear actions (Amir Tzin) [2079918]
- net/mlx5e: TC, fix decap fallback to uplink when int port not supported (Amir Tzin) [2088639]
- CI: Use zstream builder image (Veronika Kabatova)
- ice: Allow to pass VLAN tagged packets to VF when port VLAN is configured (Petr Oros) [2081794]
- ice: clear stale Tx queue settings before configuring (Petr Oros) [2081794]
- ice: fix crash when writing timestamp on RX rings (Petr Oros) [2081794]
- ice: Fix race during aux device (un)plugging (Petr Oros) [2081794]
- ice: fix PTP stale Tx timestamps cleanup (Petr Oros) [2081794]
- ice: ice_sched: fix an incorrect NULL check on list iterator (Petr Oros) [2081794]
- ice: fix use-after-free when deinitializing mailbox snapshot (Petr Oros) [2081794]
- ice: wait 5 s for EMP reset after firmware flash (Petr Oros) [2081794]
- ice: Protect vf_state check by cfg_lock in ice_vc_process_vf_msg() (Petr Oros) [2081794]
- ice: Fix incorrect locking in ice_vc_process_vf_msg() (Petr Oros) [2081794]
- ice: Fix memory leak in ice_get_orom_civd_data() (Petr Oros) [2081794]
- ice: fix crash in switchdev mode (Petr Oros) [2081794]
- Revert "iavf: Fix deadlock occurrence during resetting VF interface" (Petr Oros) [2081794]
- ice: arfs: fix use-after-free when freeing @rx_cpu_rmap (Petr Oros) [2081794]
- ice: clear cmd_type_offset_bsz for TX rings (Petr Oros) [2081794]
- ice: xsk: fix VSI state check in ice_xsk_wakeup() (Petr Oros) [2081794]
- ice: synchronize_rcu() when terminating rings (Petr Oros) [2081794]
- ice: Do not skip not enabled queues in ice_vc_dis_qs_msg (Petr Oros) [2081794]
- ice: Set txq_teid to ICE_INVAL_TEID on ring creation (Petr Oros) [2081794]
- ice: Fix broken IFF_ALLMULTI handling (Petr Oros) [2081794]
- ice: Fix MAC address setting (Petr Oros) [2081794]
- openvswitch: Fix setting ipv6 fields causing hw csum failure (Eelco Chaudron) [2086549]
- sched/cputime, proc/stat: Fix incorrect guest nice cpustat value (Waiman Long) [2084138]
- procfs: Use all-in-one vtime aware kcpustat accessor (Waiman Long) [2084138]
- procfs: Use vtime aware kcpustat accessor to fetch CPUTIME_SYSTEM (Waiman Long) [2084138]
- proc: read kernel cpu stat pointer once (Waiman Long) [2084138]
- proc: use "unsigned int" in /proc/stat hook (Waiman Long) [2084138]
- sched/cputime: Support other fields on kcpustat_field() (Waiman Long) [2084138]
- sched/cputime: Add vtime guest task state (Waiman Long) [2084138]
- sched/cputime: Add vtime idle task state (Waiman Long) [2084138]
- sched/cputime: Spare a seqcount lock/unlock cycle on context switch (Waiman Long) [2084138]
- sched/vtime: Move task_struct_rh->vtime_cpu back to vtime structure (Waiman Long) [2084138]
- net: openvswitch: fix leak of nested actions (Eelco Chaudron) [2086590]
- net/sched: fix initialization order when updating chain 0 head (Marcelo Ricardo Leitner) [2074221]
- PCI: hv: Propagate coherence from VMbus device to PCI device (Vitaly Kuznetsov) [2074829]
- Drivers: hv: vmbus: Propagate VMbus coherence to each VMbus device (Vitaly Kuznetsov) [2074829]

* Wed May 25 2022 Augusto Caringi <acaringi@redhat.com> [4.18.0-372.11.1.el8_6]
- Revert "xfs: actually bump warning counts when we send warnings" (Carlos Maiolino) [2071713]
- SUNRPC: use different lock keys for INET6 and LOCAL (Guillaume Nault) [2079856]
- Revert "netfilter: conntrack: tag conntracks picked up in local out hook" (Florian Westphal) [2065266]
- Revert "netfilter: nat: force port remap to prevent shadowing well-known ports" (Florian Westphal) [2065266]
- KVM: PPC: Book3S HV: Add infrastructure to support 2nd DAWR (Laurent Vivier) [2079069]
- KVM: PPC: Book3S HV: Rename current DAWR macros and variables (Laurent Vivier) [2079069]
- esp: limit skb_page_frag_refill use to a single page (Sabrina Dubroca) [2062114] {CVE-2022-27666}
- esp: Fix possible buffer overflow in ESP transformation (Sabrina Dubroca) [2062114] {CVE-2022-27666}
- NFS: Don't loop forever in nfs_do_recoalesce() (Scott Mayhew) [2080998]

* Fri May 13 2022 Augusto Caringi <acaringi@redhat.com> [4.18.0-372.10.1.el8_6]
- Fonts: Replace discarded const qualifier (Nico Pache) [2064762]
- Fonts: Support FONT_EXTRA_WORDS macros for built-in fonts (Nico Pache) [2064762]
- fbdev, newport_con: Move FONT_EXTRA_WORDS macros into linux/font.h (Nico Pache) [2064762]
- CI: Drop baseline runs (Veronika Kabatova)
- redhat: drop the -sha512 suffix from default rhpkg invocation (Jarod Wilson)
- redhat: switch release to zstream (Augusto Caringi)
- ceph: fix possible NULL pointer dereference for req->r_session (Xiubo Li) [2080071]