kubeovn / kube-ovn

A Bridge between SDN and Cloud Native (Project under CNCF)
https://kubeovn.github.io/docs/stable/en/
Apache License 2.0
1.93k stars 438 forks source link

[BUG] Statefulset 无法固定ip #4479

Open wolgod opened 1 week ago

wolgod commented 1 week ago

Kube-OVN Version

v1.11.10

Kubernetes Version

v1.20.11

Operation-system/Kernel Version

"CentOS Linux 7 (Core)"

Description

Statefulset 无法固定ip

Steps To Reproduce

k8s的两个node节点的containerd的存储路径路径磁盘超过85%,触发了pod的驱逐,然后pod在两台机器之见来回驱逐多次,才成功创建成功,然后发现StatefulSet的pod ip 发送了变化 日志如下

I0905 07:09:12.056244 7 pod.go:1577] pod's ip is not in the range of subnet subnet-ark-apisix-0, delete pod

根据1.11.10的代码发现1577行,ip没有取到所以导致,认为pod ip 不在子网范围内所以把pod的ips资源删掉了

    if podSubnet != nil && !util.CIDRContainIP(podSubnet.Spec.CIDRBlock, pod.Annotations[util.IpAddressAnnotation]) {
        klog.Infof("pod's ip %s is not in the range of subnet %s, delete pod", pod.Annotations[util.IpAddressAnnotation], podSubnet.Name)
        return true, nil
    }

Current Behavior

无法固定住StatefulSet的ip

Expected Behavior

固定住StatefulSet的ip

dosubot[bot] commented 1 week ago

@wolgod 你好!我是一个可以帮助你解决问题、回答疑问并成为贡献者的机器人。在等待人类维护者的过程中,我会尽力帮助你。

I found some similar issues that might be relevant to your problem:

  1. statefulset pod 重启后IP发生改变 (closed): The suggested solution was to upgrade to a newer version as the IPAM initialization logic had significant changes.
  2. statefulset pod 重启后IP发生改变 (closed): This issue had no comments or solution.
  3. kube-ovn 1.11.9 存在ip地址无法分配导致出现地址冲突 (closed): This issue was addressed by adding error logs and making changes to the IPAM logic.

To ensure that a StatefulSet in Kubernetes maintains a fixed IP address when using Kube-OVN v1.11.10 on Kubernetes v1.20.11, you can use the util.IPPoolAnnotation to specify a pool of IP addresses. Here is an example of how to create a StatefulSet with a fixed IP address using Kube-OVN:

framework.ConformanceIt("should allocate static ip for statefulset with ippool", func() {
    ippoolSep := ";"
    if f.VersionPriorTo(1, 11) {
        if f.IsDual() {
            ginkgo.Skip("Support for dual stack ippool was introduced in v1.11")
        }
        ippoolSep = ","
    }

    for replicas := 1; replicas <= 3; replicas++ {
        stsName = "sts-" + framework.RandomSuffix()
        ippool := framework.RandomIPs(cidr, ippoolSep, replicas)
        labels := map[string]string{"app": stsName}

        ginkgo.By("Creating statefulset " + stsName + " with ippool " + ippool)
        sts := framework.MakeStatefulSet(stsName, stsName, int32(replicas), labels, framework.PauseImage)
        sts.Spec.Template.Annotations = map[string]string{util.IPPoolAnnotation: ippool}
        sts = stsClient.CreateSync(sts)

        ginkgo.By("Getting pods for statefulset " + stsName)
        pods := stsClient.GetPods(sts)
        framework.ExpectHaveLen(pods.Items, replicas)

        ips := make([]string, 0, replicas)
        for _, pod := range pods.Items {
            framework.ExpectHaveKeyWithValue(pod.Annotations, util.AllocatedAnnotation, "true")
            framework.ExpectHaveKeyWithValue(pod.Annotations, util.CidrAnnotation, subnet.Spec.CIDRBlock)
            framework.ExpectHaveKeyWithValue(pod.Annotations, util.GatewayAnnotation, subnet.Spec.Gateway)
            framework.ExpectHaveKeyWithValue(pod.Annotations, util.IPPoolAnnotation, ippool)
            framework.ExpectHaveKeyWithValue(pod.Annotations, util.LogicalSwitchAnnotation, subnet.Name)
            framework.ExpectMAC(pod.Annotations[util.MacAddressAnnotation])
            framework.ExpectHaveKeyWithValue(pod.Annotations, util.RoutedAnnotation, "true")
            framework.ExpectConsistOf(util.PodIPs(pod), strings.Split(pod.Annotations[util.IPAddressAnnotation], ","))
            ips = append(ips, pod.Annotations[util.IPAddressAnnotation])
        }
        framework.ExpectConsistOf(ips, strings.Split(ippool, ippoolSep))

        ginkgo.By("Deleting pods for statefulset " + stsName)
        for _, pod := range pods.Items {
            err := podClient.Delete(pod.Name)
            framework.ExpectNoError(err, "failed to delete pod "+pod.Name)
        }
        stsClient.WaitForRunningAndReady(sts)

        ginkgo.By("Getting pods for statefulset " + stsName)
        pods = stsClient.GetPods(sts)
        framework.ExpectHaveLen(pods.Items, replicas)

        for i, pod := range pods.Items {
            framework.ExpectHaveKeyWithValue(pod.Annotations, util.AllocatedAnnotation, "true")
            framework.ExpectHaveKeyWithValue(pod.Annotations, util.CidrAnnotation, subnet.Spec.CIDRBlock)
            framework.ExpectHaveKeyWithValue(pod.Annotations, util.GatewayAnnotation, subnet.Spec.Gateway)
            framework.ExpectHaveKeyWithValue(pod.Annotations, util.IPPoolAnnotation, ippool)
            framework.ExpectHaveKeyWithValue(pod.Annotations, util.IPAddressAnnotation, ips[i])
            framework.ExpectHaveKeyWithValue(pod.Annotations, util.LogicalSwitchAnnotation, subnet.Name)
            framework.ExpectMAC(pod.Annotations[util.MacAddressAnnotation])
            framework.ExpectHaveKeyWithValue(pod.Annotations, util.RoutedAnnotation, "true")
            framework.ExpectConsistOf(util.PodIPs(pod), strings.Split(pod.Annotations[util.IPAddressAnnotation], ","))
        }

        ginkgo.By("Deleting statefulset " + stsName)
        stsClient.DeleteSync(stsName)
    }
})

This code snippet demonstrates how to create a StatefulSet with a specified IP pool, ensuring that the StatefulSet maintains a fixed IP address. The key part is setting the util.IPPoolAnnotation in the StatefulSet's template annotations [1].

Additionally, the function validatePodIP checks if the pod's IP is within the subnet's CIDR block. If the IP is not within the range, the pod is deleted:

func (c *Controller) validatePodIP(podName, subnetName, ipv4, ipv6 string) (bool, bool, error) {
    subnet, err := c.subnetsLister.Get(subnetName)
    if err != nil {
        klog.Errorf("failed to get subnet %s: %v", subnetName, err)
        return false, false, err
    }

    if subnet.Spec.Vlan == "" && subnet.Spec.Vpc == c.config.ClusterRouter {
        nodes, err := c.nodesLister.List(labels.Everything())
        if err != nil {
            klog.Errorf("failed to list nodes: %v", err)
            return false, false, err
        }

        for _, node := range nodes {
            nodeIPv4, nodeIPv6 := util.GetNodeInternalIP(*node)
            if ipv4 != "" && ipv4 == nodeIPv4 {
                klog.Errorf("IP address (%s) assigned to pod %s is the same with internal IP address of node %s, reallocating...", ipv4, podName, node.Name)
                return false, true, nil
            }
            if ipv6 != "" && ipv6 == nodeIPv6 {
                klog.Errorf("IP address (%s) assigned to pod %s is the same with internal IP address of node %s, reallocating...", ipv6, podName, node.Name)
                return true, false, nil
            }
        }
    }

    return true, true, nil
}

This function ensures that the pod's IP is valid and within the expected range of the associated subnet [2][3].

To continue talking to Dosu, mention @dosu.

bobz965 commented 1 week ago

升级下 1.11 到最新版试试呢?

wolgod commented 1 week ago

升级下 1.11 到最新版试试呢?

咱们有相关的升级步骤文档吗,毕竟是好几百台机器的线上机器,升级得慎重,也担心引入其他更严重bug