kubeovn / kube-ovn

A Bridge between SDN and Cloud Native (Project under CNCF)
https://kubeovn.github.io/docs/stable/en/
Apache License 2.0
1.96k stars 443 forks source link

[BUG] 使用underlay方式安装失败 #4485

Open kinbod opened 1 month ago

kinbod commented 1 month ago

Kube-OVN Version

v1.12.22

Kubernetes Version

v1.28.12

Operation-system/Kernel Version

fedora server40 linux kernel 5.8

Description

修改install.sh的参数ENABLE_VLAN=true;POD_CIDR=underlaycidr;POD_GATEWAY=underlaygateway其他参数没有动。然后安装。

Steps To Reproduce

1.fedora server40 linux kernel 5.8 集群6个节点,三个master,三个node,使用sealos搭建的集群 2.修改install.sh的参数ENABLE_VLAN=true;POD_CIDR=underlaycidr;POD_GATEWAY=underlaygateway其他参数没有动

Current Behavior

1.安装后卡在这里 d1786a0912ef130709ef9f470bea139 2.kube-ovn-cni无法正常启动 0377cee281b72f812fdbaad83f86551 3.查看cni日志 bb18c70da57921d7c1e0d3b6722dce5 4.默认子网 71ccc45720a40afa7ae9187931ad5aa 5.cni在ping子网join的网关的时候无法ping通

Expected Behavior

1.正常安装

dosubot[bot] commented 1 month ago

Hey @kinbod, I'm here to help you with your Kube-OVN issue. Let's work together to solve this!

I found some similar closed issues that might help you resolve the problem:

  1. [BUG] kube-ovn underlay模式下无法正常创建pod: This issue was resolved by a pull request that modified the CNI server to not set sysctl variables [1].
  2. 1.11.3 underlay模式安装失败,pod无法正常通信: This issue was resolved by enabling promiscuous mode on the virtual platform's vswitch network [2].
  3. kube-ovn,controller和 cni的pod处于CrashLoopBackOff状态: This issue was related to a CIDR conflict and was resolved by correcting the CIDR settings [3].

For your specific issue, here are some steps and checks you can perform:

  1. Ensure CHECK_GATEWAY is set to false: Modify the install.sh script to set CHECK_GATEWAY to false to disable the gateway check, which might be causing the issue:

    CHECK_GATEWAY=${CHECK_GATEWAY:-false}

    Re-run the installation after making this change [4].

  2. Verify POD_CIDR and POD_GATEWAY values: Ensure that the POD_CIDR and POD_GATEWAY parameters are correctly set in the install.sh script. They should be in the correct format:

    • POD_CIDR: "10.16.0.0/16" for IPv4 or "fd00:10:16::/112" for IPv6.
    • POD_GATEWAY: "10.16.0.1" for IPv4 or "fd00:10:16::1" for IPv6 [5][6][4].
  3. Check VLAN settings: Since you have set ENABLE_VLAN=true, ensure that the VLAN settings are correctly configured in the install.sh script and values.yaml file:

    if [ "$ENABLE_VLAN" = "true" ]; then
      NETWORK_TYPE="vlan"
      ENABLE_EIP_SNAT=${ENABLE_EIP_SNAT:-false}
      if [ "$VLAN_NIC" != "" ]; then
        VLAN_INTERFACE_NAME="$VLAN_NIC"
      fi
    fi

    Ensure that the VLAN_INTERFACE_NAME and other VLAN-specific settings are correctly set in the values.yaml file [4][6].

  4. Network readiness checks: The kube-ovn-cni component performs network readiness checks. Ensure that the ARP resolution or ping check for the gateway is successful. If the gateway is an IPv4 address and underlayGateway is true, the function attempts to resolve the MAC address of the gateway using ARP. For other types of gateways, it performs a ping check [7].

By following these steps, you should be able to resolve the issue with the kube-ovn-cni component failing to start.

To continue talking to Dosu, mention @dosu.

bobz965 commented 1 month ago

sealos 这个我不太熟悉,ovn-default 和 join 子网一开始创建出来就是图里面的 cidr么?

kinbod commented 1 month ago

sealos 这个我不太熟悉,ovn-default 和 join 子网一开始创建出来就是图里面的 cidr么?

是的

kinbod commented 1 month ago

Hey @kinbod, I'm here to help you with your Kube-OVN issue. Let's work together to solve this!

I found some similar closed issues that might help you resolve the problem:

  1. [BUG] kube-ovn underlay模式下无法正常创建pod: This issue was resolved by a pull request that modified the CNI server to not set sysctl variables [1].
  2. 1.11.3 underlay模式安装失败,pod无法正常通信: This issue was resolved by enabling promiscuous mode on the virtual platform's vswitch network [2].
  3. kube-ovn,controller和 cni的pod处于CrashLoopBackOff状态: This issue was related to a CIDR conflict and was resolved by correcting the CIDR settings [3].

For your specific issue, here are some steps and checks you can perform:

  1. Ensure CHECK_GATEWAY is set to false: Modify the install.sh script to set CHECK_GATEWAY to false to disable the gateway check, which might be causing the issue:

    CHECK_GATEWAY=${CHECK_GATEWAY:-false}

    Re-run the installation after making this change [4].

  2. Verify POD_CIDR and POD_GATEWAY values: Ensure that the POD_CIDR and POD_GATEWAY parameters are correctly set in the install.sh script. They should be in the correct format:

    • POD_CIDR: "10.16.0.0/16" for IPv4 or "fd00:10:16::/112" for IPv6.
    • POD_GATEWAY: "10.16.0.1" for IPv4 or "fd00:10:16::1" for IPv6 [5][6][4].
  3. Check VLAN settings: Since you have set ENABLE_VLAN=true, ensure that the VLAN settings are correctly configured in the install.sh script and values.yaml file:

    if [ "$ENABLE_VLAN" = "true" ]; then
     NETWORK_TYPE="vlan"
     ENABLE_EIP_SNAT=${ENABLE_EIP_SNAT:-false}
     if [ "$VLAN_NIC" != "" ]; then
       VLAN_INTERFACE_NAME="$VLAN_NIC"
     fi
    fi

    Ensure that the VLAN_INTERFACE_NAME and other VLAN-specific settings are correctly set in the values.yaml file [4][6].

  4. Network readiness checks: The kube-ovn-cni component performs network readiness checks. Ensure that the ARP resolution or ping check for the gateway is successful. If the gateway is an IPv4 address and underlayGateway is true, the function attempts to resolve the MAC address of the gateway using ARP. For other types of gateways, it performs a ping check [7].

By following these steps, you should be able to resolve the issue with the kube-ovn-cni component failing to start.

To continue talking to Dosu, mention @dosu.

还是安装不成功

oilbeater commented 1 month ago

join 子网应该是 overlay 的,这个不通可能是环境问题,这个环境是不是之前装过 kube-ovn 没有做清理

kinbod commented 1 month ago

join 子网应该是 overlay 的,这个不通可能是环境问题,这个环境是不是之前装过 kube-ovn 没有做清理

这是自己本地的环境,这个要怎么清理呢???删除/etc/下面的cni,然后重启所有机器??

bobz965 commented 1 month ago

join 子网应该是 overlay 的,这个不通可能是环境问题,这个环境是不是之前装过 kube-ovn 没有做清理

这是自己本地的环境,这个要怎么清理呢???删除/etc/下面的cni,然后重启所有机器??

please refer: https://kubeovn.github.io/docs/v1.13.x/start/uninstall/

kinbod commented 1 month ago

嗯,是的,我删除的时候就是执行的这个脚本


发件人: bobz965 @.> 发送时间: 2024年9月7日 13:09 收件人: kubeovn/kube-ovn @.> 抄送: Bodkin Quan @.>; Mention @.> 主题: Re: [kubeovn/kube-ovn] [BUG] 使用underlay方式安装失败 (Issue #4485)

join 子网应该是 overlay 的,这个不通可能是环境问题,这个环境是不是之前装过 kube-ovn 没有做清理

这是自己本地的环境,这个要怎么清理呢???删除/etc/下面的cni,然后重启所有机器??

please refer: https://kubeovn.github.io/docs/v1.13.x/start/uninstall/

― Reply to this email directly, view it on GitHubhttps://github.com/kubeovn/kube-ovn/issues/4485#issuecomment-2335045796, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AB7CR5NYICWAPVFH2CDJCYDZVKC7ZAVCNFSM6AAAAABNXZGDJ2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMZVGA2DKNZZGY. You are receiving this because you were mentioned.Message ID: @.***>