k8snetworkplumbingwg / multus-cni

A CNI meta-plugin for multi-homed pods in Kubernetes
Apache License 2.0
2.41k stars 584 forks source link

Race condition on node startup causing Pods to get stuck in ContainerCreating #1312

Closed seastco closed 3 months ago

seastco commented 4 months ago

What happened: Pod stuck in ContainerCreating. Events show a "failed to setup network for sandbox" loop.

  Warning  FailedCreatePodSandBox  2m2s (x16 over 3m)  kubelet      (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "a6092ae23ccfc6d2eadf18642437053ede456a3a45d1f1a748b9fcd827a92c85": plugin type="multus" name="multus-cni-network" failed (add): [abc-09f7b-jp4rb/mango-01/f84c6bb9-d487-4c6f-a22b-fae50463e461:multus-cni-network]: error adding container to network "multus-cni-network": [abc-09f7b-jp4rb/mango-01/f84c6bb9-d487-4c6f-a22b-fae50463e461:abc-09f7b-jp4rb-main-mesh]: error adding container to network "abc-09f7b-jp4rb-main-mesh": DelegateAdd: cannot set "ovn-k8s-cni-overlay" interface name to "pod21e2223678f": validateIfName: interface name pod21e2223678f already exists
  Normal   AddedInterface          2m                  multus       Add eth0 [100.64.9.221/32] from aws-cni
  Normal   AddedInterface          2m                  multus       Add pod21e2223678f [] from abc-09f7b-jp4rb/abc-09f7b-jp4rb-main-mesh
  Normal   AddedInterface          2m                  multus       Add pod1f9bb198e14 [] from abc-09f7b-jp4rb/abc-09f7b-jp4rb-lemon-mesh
  Normal   AddedInterface          119s                multus       Add pod42c79b6cd76 [] from abc-09f7b-jp4rb/abc-09f7b-jp4rb-mango-mesh-01
  Normal   AddedInterface          119s                multus       Add eth0 [100.64.9.221/32] from multus-cni-network
  Normal   AddedInterface          116s                multus       Add eth0 [100.64.0.250/32] from aws-cni
  Normal   AddedInterface          116s                multus       Add pod21e2223678f [] from abc-09f7b-jp4rb/abc-09f7b-jp4rb-main-mesh
  Normal   AddedInterface          116s                multus       Add pod1f9bb198e14 [] from abc-09f7b-jp4rb/abc-09f7b-jp4rb-lemon-mesh
  Normal   AddedInterface          115s                multus       Add pod42c79b6cd76 [] from abc-09f7b-jp4rb/abc-09f7b-jp4rb-mango-mesh-01
  Normal   AddedInterface          115s                multus       Add eth0 [100.64.0.250/32] from multus-cni-network
  Normal   AddedInterface          112s                multus       Add eth0 [100.64.46.81/32] from aws-cni
  Normal   AddedInterface          112s                multus       Add pod21e2223678f [] from abc-09f7b-jp4

Manual resolution: Deleting /etc/cni/net.d/00-multus.conf file and etc/cni/net.d/multus.d and restarting the daemonset pod resolves this issue for subsequent pods being scheduled. Existing pods finish creating but leave the networking in a bad state (e.g. separate from the above example, after resolving I've seen interface pod6c270ef2f25 not found: route ip+net: no such network interface)

How to reproduce it (as minimally and precisely as possible): Not sure how to reproduce unfortunately. I believe it's a race condition happening less than 1% of the time. My EKS cluster has instances constantly scaling up and down throughout the day, but I've only seen this 2-3x in the past few months.

Possibly related? https://github.com/k8snetworkplumbingwg/multus-cni/issues/1221, though this isn't happening after a node reboot, and I'm not using the thick plugin.

Anything else we need to know?: Below is the /etc/cni/net.d/00-multus.conf on a bad node, which I suspect is wrong. delegates is nested within delegates, and all the top-level fields are repeated again. It's like a bad merge happened. This is not the same as what's on a working node. (Sorry, I trimmed out some of the config when sharing with my team and the node doesn't exist anymore, so this is all I have):

{
  "cniVersion": "0.4.0",
  "name": "multus-cni-network",
  "type": "multus",
  "capabilities": {
    "portMappings": true
  },
  "kubeconfig": "/etc/cni/net.d/multus.d/multus.kubeconfig",
  "delegates": [
    {
      "cniVersion": "0.4.0",
      "name": "multus-cni-network",
      "type": "multus",
      "capabilities": {
        "portMappings": true
      },
      "kubeconfig": "/etc/cni/net.d/multus.d/multus.kubeconfig",
      "delegates": [
        {
          "cniVersion": "0.4.0",
          "disableCheck": true,
          "name": "aws-cni",
          "plugins": [
            {
              ...
            },
            {
              ...
            },
            {
              "capabilities": {
                "portMappings": true
              },
              "snat": true,
              "type": "portmap"
            }
          ]
        }
      ],
    }
  ]
}

/etc/cni/net.d/00-multus.conf on a WORKING node:

{
  "cniVersion": "0.4.0",
  "name": "multus-cni-network",
  "type": "multus",
  "capabilities": {
    "portMappings": true
  },
  "kubeconfig": "/etc/cni/net.d/multus.d/multus.kubeconfig",
  "delegates": [
    {
      "cniVersion": "0.4.0",
      "disableCheck": true,
      "name": "aws-cni",
      "plugins": [
        {
          "mtu": "9001",
          "name": "aws-cni",
          "pluginLogFile": "/var/log/aws-routed-eni/plugin.log",
          "pluginLogLevel": "DEBUG",
          "podSGEnforcingMode": "strict",
          "type": "aws-cni",
          "vethPrefix": "eni"
        }
      ]
    },
    {
      "enabled": "false",
      "ipam": {
        "dataDir": "/run/cni/v4pd/egress-v6-ipam",
        "ranges": [
          [
            {
              "subnet": "fd00::ac:00/118"
            }
          ]
        ],
        "routes": [
          {
            "dst": "::/0"
          }
        ],
        "type": "host-local"
      },
      "mtu": "9001",
      "name": "egress-cni",
      "nodeIP": "",
      "pluginLogFile": "/var/log/aws-routed-eni/egress-v6-plugin.log",
      "pluginLogLevel": "DEBUG",
      "randomizeSNAT": "prng",
      "type": "egress-cni"
    },
    {
      "capabilities": {
        "portMappings": true
      },
      "snat": true,
      "type": "portmap"
    }
  ]
}

Potentially another clue: it's the norm for multus daemonset pods to fail 2x on startup with:

Defaulted container "kube-multus" out of: kube-multus, install-multus-binary (init)
kubeconfig is created in /host/etc/cni/net.d/multus.d/multus.kubeconfig
kubeconfig file is created.
panic: runtime error: index out of range [0] with length 0

goroutine 1 [running]:
main.(*Options).createMultusConfig(0xc000236200)
    /usr/src/multus-cni/cmd/thin_entrypoint/main.go:297 +0x1f45
main.main()
    /usr/src/multus-cni/cmd/thin_entrypoint/main.go:539 +0x445

Happens to almost every new pod:

kube-multus-ds-248xv         1/1     Running   2 (4d1h ago)    4d1h
kube-multus-ds-24sqt         1/1     Running   2 (2d2h ago)    2d2h
kube-multus-ds-5kgtn         1/1     Running   2 (25h ago)     25h
kube-multus-ds-5mk26         1/1     Running   2 (47h ago)     47h
kube-multus-ds-5vk77         1/1     Running   2 (6h57m ago)   6h57m
kube-multus-ds-672tc         1/1     Running   2 (7d6h ago)    7d6h
kube-multus-ds-68fvr         1/1     Running   2 (52m ago)     52m
kube-multus-ds-6jn94         1/1     Running   2 (8d ago)      8d
kube-multus-ds-78qts         1/1     Running   2 (4d3h ago)    4d3h
kube-multus-ds-7n7sj         1/1     Running   2 (4d3h ago)    4d3h
kube-multus-ds-bcrmp         1/1     Running   0               77d
kube-multus-ds-f84xn         1/1     Running   2 (3d ago)      3d
kube-multus-ds-flsf6         1/1     Running   2 (50m ago)     50m
kube-multus-ds-fqj2j         1/1     Running   2 (24h ago)     24h
kube-multus-ds-hnj84         1/1     Running   2 (2d1h ago)    2d1h
kube-multus-ds-hss8c         1/1     Running   0               77d
kube-multus-ds-hvlr8         1/1     Running   2 (26h ago)     26h
kube-multus-ds-kpsqt         1/1     Running   2 (6h57m ago)   6h57m
kube-multus-ds-l26qr         1/1     Running   2 (30h ago)     30h
kube-multus-ds-lqtmp         1/1     Running   2 (30h ago)     30h
kube-multus-ds-mg2gz         1/1     Running   2 (30h ago)     30h
kube-multus-ds-n486d         1/1     Running   2 (20h ago)     20h
kube-multus-ds-nsk4q         1/1     Running   0               77d
kube-multus-ds-ntf2r         1/1     Running   0               8d
kube-multus-ds-pw2q6         1/1     Running   2 (21h ago)     21h
kube-multus-ds-r82lj         1/1     Running   0               77d

Found an issue related to this: https://github.com/k8snetworkplumbingwg/multus-cni/issues/1092, I am not using OVN-Kind, but am running ovn-kubernetes as a secondary CNI. Not sure why ovn-kubernetes would affect this.

Environment:

   kube-multus:
    Image:      artifactory.seastco.dev/public-images/k8snetworkplumbingwg/multus-cni:v4.0.2
    Port:       <none>
    Host Port:  <none>
    Command:
      /thin_entrypoint
    Args:
      --multus-conf-file=auto
      --multus-autoconfig-dir=/host/etc/cni/net.d
      --cni-conf-dir=/host/etc/cni/net.d
seastco commented 3 months ago

OK, still not totally sure about the root cause of the config being a mess but I've learned more since raising this issue and can work around it.

  1. The multus pod restart happening right on startup is because there's no CNI file. I'm running on EKS, where the AWS CNI daemonset pod creates 10-aws.conflist. Because this file takes a second to be created, and these daemonset pods are starting up at the same time on a new node, multus will fail and restart. OK that's fine. Red herring.
  2. In my cluster, multus pods were being evicted by nodes with high memory utilization. THIS restart is what caused the weird 00-multus.conf result highlighted above. I changed resources requests == limits on the init container to give the Multus pod Guaranteed QoS and to stop it from being evicted.
  3. I upgraded to v4.1.0 and set --cleanup-config-on-exit=true. Now the 00-multus.conf is removed on pod teardown.

So again, not sure why 00-multus.conf file isn't resilient to restarts, but if you're running into this issue consider making your pods have Guaranteed QoS and/or setting --cleanup-config-on-exit=true.

EDIT - I've dumped a lot of irrelevant info into this thread so I'm going to close this issue and create a new one about 00-multus.conf not being resilient to restarts.