kubeovn / kube-ovn

A Bridge between SDN and Cloud Native (Project under CNCF)
https://kubeovn.github.io/docs/stable/en/
Apache License 2.0
1.93k stars 441 forks source link

[BUG] 集群中存在多个子网和 IP 池时,kubeovn 无法正确识别 `attachnet.default.ovn.kubernetes.io/logical_switch: subnet-10-69` 为 kubevirt VirtualMachine 分配 IP 地址 #4573

Open hexiaodai opened 2 hours ago

hexiaodai commented 2 hours ago

Kube-OVN Version

v1.12

Kubernetes Version

v1.25.3

Operation-system/Kernel Version

❯ awk -F '=' '/PRETTY_NAME/ { print $2 }' /etc/os-release "Ubuntu 22.04.2 LTS" ❯ uname -r 6.8.0-40-generic

Description

集群中存在多个子网和 IP 池时,kubeovn 无法正确识别 attachnet.default.ovn.kubernetes.io/logical_switch: subnet-10-69 为 kubevirt VirtualMachine 分配 IP 地址。

Steps To Reproduce

  1. 创建 subnet-10-66subnet-10-69 子网:
NAME           PROVIDER                 VPC           PROTOCOL   CIDR            PRIVATE   NAT     DEFAULT   GATEWAYTYPE   V4USED   V4AVAILABLE   V6USED   V6AVAILABLE   EXCLUDEIPS                                             U2OINTERCONNECTIONIP
subnet-10-69   attachnet.default.ovn    ovn-cluster   IPv4       10.69.0.0/16    false     true    false     distributed   3        65470         0        0             ["10.69.0.1..10.69.0.10","10.69.0.101..10.69.0.151"]
subnet-10-70   attachnet2.default.ovn   ovn-cluster   IPv4       10.70.0.0/16    false     true    false     distributed   0        65473         0        0             ["10.70.0.1..10.70.0.10","10.70.0.101..10.70.0.151"]

# subnet-10-66
apiVersion: kubeovn.io/v1
kind: Subnet
metadata:
  name: subnet-10-66
spec:
  cidrBlock: 10.66.0.0/16
  default: false
  enableLb: true
  excludeIps:
  - 10.66.0.1..10.66.0.10
  - 10.66.0.101..10.66.0.151
  gateway: 10.66.0.1
  gatewayNode: ""
  gatewayType: distributed
  namespaces:
  - default
  natOutgoing: true
  private: false
  protocol: IPv4
  provider: attachnet.default.ovn
  vpc: ovn-cluster

# subnet-10-69
apiVersion: kubeovn.io/v1
kind: Subnet
metadata:
  name: subnet-10-69
spec:
  cidrBlock: 10.69.0.0/16
  default: false
  enableLb: true
  excludeIps:
  - 10.69.0.1..10.69.0.10
  - 10.69.0.101..10.69.0.151
  gateway: 10.69.0.1
  gatewayNode: ""
  gatewayType: distributed
  namespaces:
  - default
  natOutgoing: true
  private: false
  protocol: IPv4
  provider: attachnet.default.ovn
  vpc: ovn-cluster
  1. 创建 subnet-10-66-6subnet-10-69-9 IP 地址池,并且分别指定 subnet 字段为 subnet-10-66 和 subnet-10-69:
NAME             SUBNET         IPS                           V4USED   V4AVAILABLE   V6USED   V6AVAILABLE
subnet-10-66-6   subnet-10-66   ["10.66.6.10..10.66.6.240"]   8        223           0        0
subnet-10-69-9   subnet-10-69   ["10.69.9.10..10.69.9.240"]   2        229           0        0

# subnet-10-66-6
apiVersion: kubeovn.io/v1
kind: IPPool
metadata:
  name: subnet-10-66-6
spec:
  ips:
  - 10.66.6.10..10.66.6.240
  namespaces:
  - default
  subnet: subnet-10-66

# subnet-10-69-9
apiVersion: kubeovn.io/v1
kind: IPPool
metadata:
  name: subnet-10-69-9
spec:
  ips:
  - 10.69.9.10..10.69.9.240
  namespaces:
  - default
  subnet: subnet-10-69
  1. 创建 kubevirt VirtualMachine,并且指定 attachnet.default.ovn.kubernetes.io/logical_switch: subnet-10-69
apiVersion: kubevirt.io/v1
kind: VirtualMachine
...
  template:
    metadata:
      annotations:
        attachnet.default.ovn.kubernetes.io/logical_switch: subnet-10-69
  1. 查看 virt-launcher-vmpod
Events:
  Type     Reason                  Age                 From                 Message
  ----     ------                  ----                ----                 -------
  Normal   Scheduled               27s                 default-scheduler    Successfully assigned default/virt-launcher-vm-k8s-qz8ff to ubuntu
  Warning  AcquireAddressFailed    23s (x14 over 33s)  kube-ovn-controller  NoAvailableAddress
  Warning  FailedCreatePodSandBox  6s                  kubelet              Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "c4f6501ee3eb4c28742213fab5e8f560ee39fe01d9796142142bc7d64de845cc": plugin type="multus" name="multus-cni-network" failed (add): [default/virt-launcher-vm-k8s-qz8ff/7acbcc49-8d37-40c1-a0ff-46e3efb81599:attachnet]: error adding container to network "attachnet": RPC failed; request ip return 500 no address allocated to pod default/virt-launcher-vm-k8s-qz8ff provider attachnet.default.ovn, please see kube-ovn-controller logs to find errors
  1. 查看 kube-ovn-controller 日志,发现它尝试从 subnet-10-66-6 IP 地址池中分配 IP(期望状态是从 subnet-10-69-9 IP 池中分配 IP,因为 subnet-10-69-9 IP 池与 subnet-10-69 子网绑定)
I0929 18:05:47.763725       7 pod.go:347] enqueue update pod default/virt-launcher-vm-k8s-qz8ff
I0929 18:05:47.763828       7 pod.go:519] handle add/update pod default/virt-launcher-vm-k8s-qz8ff
I0929 18:05:47.793967       7 pod.go:576] sync pod default/virt-launcher-vm-k8s-qz8ff allocated
I0929 18:05:47.794121       7 ipam.go:62] allocate v4 , v6 , mac  for default/vm-k8s from ippool subnet-10-66-6 in subnet subnet-10-69
E0929 18:05:47.794255       7 pod.go:589] NoAvailableAddress
I0929 18:05:47.795021       7 event.go:377] Event(v1.ObjectReference{Kind:"Pod", Namespace:"default", Name:"virt-launcher-vm-k8s-qz8ff", UID:"7acbcc49-8d37-40c1-a0ff-46e3efb81599", APIVersion:"v1", ResourceVersion:"19804583", FieldPath:""}): type: 'Warning' reason: 'AcquireAddressFailed' NoAvailableAddress
E0929 18:05:47.795328       7 pod.go:406] error syncing 'default/virt-launcher-vm-k8s-qz8ff': NoAvailableAddress, requeuing
  1. 修改 kubevirt VirtualMachine,并且同时指定 attachnet.default.ovn.kubernetes.io/logical_switch: subnet-10-69attachnet.default.ovn.kubernetes.io/ip_pool: subnet-10-69-9,IP 分配正常,kubevirt VirtualMachine 正常启动
apiVersion: kubevirt.io/v1
kind: VirtualMachine
...
  template:
    metadata:
      annotations:
        attachnet.default.ovn.kubernetes.io/logical_switch: subnet-10-69
        attachnet.default.ovn.kubernetes.io/ip_pool: subnet-10-69-9
  1. 删除 subnet-10-66-9subnet-10-69-9,并且仅指定 kubevirt VirtualMachine attachnet.default.ovn.kubernetes.io/logical_switch: subnet-10-69,IP 分配正常,kubevirt VirtualMachine 正常启动

Current Behavior

集群中同时存在多个子网和 IP 池,kubeovn 无法从通过指定 attachnet.default.ovn.kubernetes.io/logical_switch: subnet-10-69 为 kubevirt VirtualMachine 分配 IP 地址。

详见 kube-ovn-controller 日志:

I0929 18:05:47.763725       7 pod.go:347] enqueue update pod default/virt-launcher-vm-k8s-qz8ff
I0929 18:05:47.763828       7 pod.go:519] handle add/update pod default/virt-launcher-vm-k8s-qz8ff
I0929 18:05:47.793967       7 pod.go:576] sync pod default/virt-launcher-vm-k8s-qz8ff allocated
I0929 18:05:47.794121       7 ipam.go:62] allocate v4 , v6 , mac  for default/vm-k8s from ippool subnet-10-66-6 in subnet subnet-10-69
E0929 18:05:47.794255       7 pod.go:589] NoAvailableAddress
I0929 18:05:47.795021       7 event.go:377] Event(v1.ObjectReference{Kind:"Pod", Namespace:"default", Name:"virt-launcher-vm-k8s-qz8ff", UID:"7acbcc49-8d37-40c1-a0ff-46e3efb81599", APIVersion:"v1", ResourceVersion:"19804583", FieldPath:""}): type: 'Warning' reason: 'AcquireAddressFailed' NoAvailableAddress
E0929 18:05:47.795328       7 pod.go:406] error syncing 'default/virt-launcher-vm-k8s-qz8ff': NoAvailableAddress, requeuing

Expected Behavior

集群中同时存在多个子网和 IP 池时,kubeovn 能正确的通过 attachnet.default.ovn.kubernetes.io/logical_switch: subnet-10-69 为 kubevirt VirtualMachine 分配 IP 地址。

dosubot[bot] commented 2 hours ago

@hexiaodai 你好! 我是 Dosu,一个可以帮助你解决 bug、回答问题并成为贡献者的机器人。我在这里帮助你,等待人类维护者。

要在集群中存在多个子网和 IP 池时,正确地通过 attachnet.default.ovn.kubernetes.io/logical_switch: subnet-10-69 为 KubeVirt VirtualMachine 分配 IP 地址,您需要确保 VirtualMachine 被正确注解以指定逻辑交换机(子网)。

以下是如何注解 VirtualMachine 的示例:

apiVersion: kubevirt.io/v1alpha3
kind: VirtualMachine
metadata:
  name: my-vm
  annotations:
    attachnet.default.ovn.kubernetes.io/logical_switch: subnet-10-69
spec:
  running: false
  template:
    metadata:
      labels:
        kubevirt.io/domain: my-vm
    spec:
      domain:
        devices:
          disks:
          - disk:
              bus: virtio
            name: containerdisk
      volumes:
      - name: containerdisk
        containerDisk:
          image: kubevirt/cirros-container-disk-demo

在这个示例中,注解 attachnet.default.ovn.kubernetes.io/logical_switch: subnet-10-69 用于指定 VirtualMachine 应该连接到名为 subnet-10-69 的逻辑交换机(子网)。

此外,确保子网 subnet-10-69 在 Kube-OVN 中正确配置,并且 IPAM(IP 地址管理)系统知道这个子网。Kube-OVN 控制器会解析这些注解并从指定的子网中分配 IP 地址。控制器代码处理网络注解的解析,并确保使用正确的子网进行 IP 分配:

func (c *Controller) getPodAttachmentNet(pod *v1.Pod) ([]*kubeovnNet, error) {
    // ... existing code ...
    subnetName := pod.Annotations[fmt.Sprintf(util.LogicalSwitchAnnotationTemplate, providerName)]
    if subnetName == "" {
        for _, subnet := range subnets {
            if subnet.Spec.Provider == providerName {
                subnetName = subnet.Name
                break
            }
        }
    }
    // ... existing code ...
}

这种配置确保即使在集群中存在多个子网和 IP 池时,VirtualMachine 的 IP 地址分配也会从指定的子网中进行[1][2]

To continue talking to Dosu, mention @dosu.