nokia / danm

TelCo grade network management in a Kubernetes cluster
BSD 3-Clause "New" or "Revised" License
373 stars 81 forks source link

cannot create pod due to `Error delegating ADD to CNI plugin:flannel because:OS exec call failed:missing network name` #250

Closed Panlichen closed 3 years ago

Panlichen commented 3 years ago

Is this a BUG REPORT or FEATURE REQUEST?:

bug

What happened:

I use danm-installer job to deploy danm, and succeed, as shown:

$ kubectl get pod -n kube-system 
NAME                                      READY   STATUS      RESTARTS   AGE
coredns-6955765f44-ft7sz                  1/1     Running     0          171m
coredns-6955765f44-rgfxf                  1/1     Running     0          171m
danm-cni-p2dz9                            1/1     Running     0          114m
danm-cni-t5x44                            1/1     Running     0          114m
danm-installer-z45rj                      0/1     Completed   0          114m
danm-webhook-deployment-8dddc66b4-kj5g5   1/1     Running     0          114m
etcd-node-0                               1/1     Running     0          171m
kube-apiserver-node-0                     1/1     Running     0          171m
kube-controller-manager-node-0            1/1     Running     0          171m
kube-flannel-ds-pm8lj                     1/1     Running     0          163m
kube-flannel-ds-zmfpn                     1/1     Running     0          163m
kube-proxy-4vlpr                          1/1     Running     0          171m
kube-proxy-nwg8k                          1/1     Running     0          163m
kube-scheduler-node-0                     1/1     Running     0          171m
netwatcher-pj27d                          1/1     Running     0          114m
netwatcher-z8d8x                          1/1     Running     0          114m
svcwatcher-94nd9                          1/1     Running     0          114m

I deploy a simple danm network:

apiVersion: danm.k8s.io/v1
kind: DanmNet
metadata:
  name: management
spec:
  #Change this to match the CNI config file name in your environment
  NetworkID: flannel
  NetworkType: flannel

and simple pod:

apiVersion: v1
kind: Pod
metadata:
  name: simple-test-pod
  annotations:
    danm.k8s.io/interfaces: |
      [
        {"network":"management", "ip":"dynamic"}
      ]
spec:
  containers:
  - name: appcntr1 
    image: centos/tools 
    imagePullPolicy: IfNotPresent
    command: [ "/bin/bash", "-c", "--" ]
    args: [ "while true; do sleep 300000; done;" ]

the net seems good:

$ kk get dnet
NAME         AGE
management   103m

Stuff in /etc/cni/net.d is:

$ ll /etc/cni/net.d
total 16K
-rw-r--r-- 1 root root  175 Apr  1 00:08 00-danm.conf
-rw-r--r-- 1 root root  292 Apr  1 00:08 10-flannel.conflist
-rw------- 1 root root 2.9K Apr  1 00:08 danm-kubeconfig
-rw-r--r-- 1 root root  123 Apr  1 00:08 flannel.conf
$ sudo cat /etc/cni/net.d/danm-kubeconfig
---
apiVersion: v1
kind: Config
current-context: default
clusters:
  - cluster:
      certificate-authority-data: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUN5RENDQWJDZ0F3SUJBZ0lCQURBTkJna3Foa2lHOXcwQkFRc0ZBREFWTVJNd0VRWURWUVFERXdwcmRXSmwKY201bGRHVnpNQjRYRFRJeE1EUXdNVEExTVRBMU1sb1hEVE14TURNek1EQTFNVEExTWxvd0ZURVRNQkVHQTFVRQpBeE1LYTNWaVpYSnVaWFJsY3pDQ0FTSXdEUVlKS29aSWh2Y05BUUVCQlFBRGdnRVBBRENDQVFvQ2dnRUJBTWszClp0UHRUbzhlUHFhNFJhdDBTbXpPYmx0dUhtdC9FcVlaYWQ1YlppODFQRFYyQnh2V1hJSi8zdXdKWE9USGRkaGoKMWJHQ1J6YmxwZjZ1K0tBSXZFdjl5c2Y4V2JTbGt3L3hHVGY2ckFPcUFkNnJ3V3g5a3NERTFyVE01L0hmWllHaQpDYjYyRU5tYzlJMkphWCtIYVpMaDQ1WDRiYUlxWTUrQmgvTHRkVTd2ekEvSXU5dHNmTWhxblU1V0ZsTC9JS1EwCm9JU2NLTitQWmROdXJyRWNnZTRubU5IbDQzNFQ3a1BiN0N0Wmdkb3Nna3B6TzdhdWRIenJDRHpPbnk0MnlIV2gKbHNmVHRQeFZ6ODZjTExvQWdKNTRrNVlYMFZaeUE4Q0lHdFJ3U0xRakFZQkpyZW13Tys5c3RybFN5VHR4ZEZrdwoyWDJ2cDlLWjMyU3U3STduRTZNQ0F3RUFBYU1qTUNFd0RnWURWUjBQQVFIL0JBUURBZ0trTUE4R0ExVWRFd0VCCi93UUZNQU1CQWY4d0RRWUpLb1pJaHZjTkFRRUxCUUFEZ2dFQkFEY2Rmajc4b2l3clhPTUJkeTB6RWJpNGt0K0cKMldBdVlpQWNNUk9pQ1RlQi96TngvQkJubXljL01HQkxkYS9SUC9zYVdaT1lWK3B5MER1UHlORTdKWDFBV0J5dAp6aGdKc0Z6Tk1LZzRVeGZXRWpCOGN5NkVxbmhaOCtWL2ZieG5FWVdLZDNuWGFaNHh2aXY4KzlOZ2pmeUZPTU5ECm9qajluNTl2TEljZU80U0ZvbndqWEN5dU9tRWZOdXJrTHV4REp3VDdlWW5hdGluVFRYb1lMTG5ETmpiZ0pDdjMKY01VRHdxMnJObjN0ZnBBNVM2cDVmVUhMc0xIQTJJVElEd2dxdFB3VVRhUi9hWHFRNFlVNTRJRUErbU5PVHF5Kwp2dldaS3pYZHFFSjV2TlVIQWg3SkVlb29MZFhVWlo1UmRKbUxZQ0xWbnlGNWUwamplcU5ReDRPY1JOYz0KLS0tLS1FTkQgQ0VSVElGSUNBVEUtLS0tLQo=
      server: https://192.168.1.1:6443
    name: kubernetes
contexts:
  - context:
      cluster: kubernetes
      user: danm
    name: default
users:
  - name: danm
    user:
      token: ZXlKaGJHY2lPaUpTVXpJMU5pSXNJbXRwWkNJNkltSkliWGszV0dkNWFGY3diVEZtWlhsVVdFRkRlRFEyYTBkdGJWRlhRUzFMYUZsWWVYRmlUVmw2UkRnaWZRLmV5SnBjM01pT2lKcmRXSmxjbTVsZEdWekwzTmxjblpwWTJWaFkyTnZkVzUwSWl3aWEzVmlaWEp1WlhSbGN5NXBieTl6WlhKMmFXTmxZV05qYjNWdWRDOXVZVzFsYzNCaFkyVWlPaUpyZFdKbExYTjVjM1JsYlNJc0ltdDFZbVZ5Ym1WMFpYTXVhVzh2YzJWeWRtbGpaV0ZqWTI5MWJuUXZjMlZqY21WMExtNWhiV1VpT2lKa1lXNXRMWFJ2YTJWdUxXSnlZM0p0SWl3aWEzVmlaWEp1WlhSbGN5NXBieTl6WlhKMmFXTmxZV05qYjNWdWRDOXpaWEoyYVdObExXRmpZMjkxYm5RdWJtRnRaU0k2SW1SaGJtMGlMQ0pyZFdKbGNtNWxkR1Z6TG1sdkwzTmxjblpwWTJWaFkyTnZkVzUwTDNObGNuWnBZMlV0WVdOamIzVnVkQzUxYVdRaU9pSmlZamxsTmpCak5TMHdOMkkyTFRSak5USXRPR05sTUMxallXTTFNR1E0TkRrMVpqUWlMQ0p6ZFdJaU9pSnplWE4wWlcwNmMyVnlkbWxqWldGalkyOTFiblE2YTNWaVpTMXplWE4wWlcwNlpHRnViU0o5LkVmNFFhazRROENQd2RBcHV1MkdnOTdsWG91Y0dRd3ZsTFBrQU9zVVRuV2tyQnU5c3IzTHp6M0tJdzNub3RGMEtsbm4tMTdSamd5U0FQOG5lTUxlNTVaeWY2U295OGhJUjB6SEVaaG9YOTBZQjlZaXVKaWYybGdBZ1ZxWDNUTWJXQ0twanJKbDFBTm1vR1JPcGZoZDg0UG1ZY3c5NzFBU0RLQUplYTVKSEw4cGdpSS11NkxkUmdyTUtSaTFFaXNjaEY2YXQwOUdZM3REWk1EV3ZHLTF5MjlDQzZJS2kwNU5KbXBiSUVpU1dwWWpndHZ5NGVuVUlHLXN4Y0hkQlJnd29ORVBjaGZLMm5YQ3pWcUtwS3RrWkhCOUplMExjQ28tdm1uYjY5c28tUGhONU9XZ21IbzRYa1h0ZmlqellwN0k0SkdRNWx1X2d5S3czalNDbjY0SVA0QQ==
preferences: {}

$ sudo cat /etc/cni/net.d/00-danm.conf   
{
  "cniVersion": "0.3.1",
  "name": "danm_meta_cni",
  "type": "danm",
  "kubeconfig": "/etc/cni/net.d/danm-kubeconfig",
  "cniDir": "/etc/cni/net.d",
  "namingScheme": ""
}

$ sudo cat /etc/cni/net.d/flannel.conf 
{
  "cniVersion": "0.3.1",
  "type": "flannel",
  "delegate": {
    "hairpinMode": true,
    "isDefaultGateway": true
  }
}

but the pod gets stuck:

$ kk describe pod simple-test-pod
Name:         simple-test-pod
Namespace:    default
Priority:     0
Node:         node-1/192.168.1.2
Start Time:   Thu, 01 Apr 2021 06:32:58 -0600
Labels:       <none>
Annotations:  danm.k8s.io/interfaces:
                [
                  {"network":"management", "ip":"dynamic"}
                ]
              kubectl.kubernetes.io/last-applied-configuration:
                {"apiVersion":"v1","kind":"Pod","metadata":{"annotations":{"danm.k8s.io/interfaces":"[\n  {\"network\":\"management\", \"ip\":\"dynamic\"}...
Status:       Pending
IP:           
IPs:          <none>
Containers:
  appcntr1:
    Container ID:  
    Image:         centos/tools
    Image ID:      
    Port:          <none>
    Host Port:     <none>
    Command:
      /bin/bash
      -c
      --
    Args:
      while true; do sleep 300000; done;
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-4sq6w (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  default-token-4sq6w:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-4sq6w
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason                  Age                     From             Message
  ----     ------                  ----                    ----             -------
  Warning  FailedCreatePodSandBox  16m (x2909 over 121m)   kubelet, node-1  (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "e303a219623654d694ccd56abc24259f5c2227e107efe106f6330193ed2906b5" network for pod "simple-test-pod": networkPlugin cni failed to set up pod "simple-test-pod_default" network: CNI network could not be set up: CNI operation for network:management failed with:CNI delegation failed due to error:Error delegating ADD to CNI plugin:flannel because:OS exec call failed:missing network name
  Normal   SandboxChanged          103s (x3334 over 121m)  kubelet, node-1  Pod sandbox changed, it will be killed and re-created.

It seems that the bootstrap Network is not properly configured and the pod cannot run.

After creating the cluster with kubeadm init/join, I deploy flannel CNI as the official says:

kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml

and it creates 10-flannel.conflist in /etc/cni/net.d, which seems cannot be used by danm directly.

So in danm-installer-config.ymal, I use

  default_cni_type: flannel

  default_cni_network_id: flannel

  default_cni_config_data: |
    {
      "cniVersion": "0.3.1",
      "type": "flannel",
      "delegate": {
        "hairpinMode": true,
        "isDefaultGateway": true
      }
    }

to let danm create flannel.conf in /etc/cni/net.d.

Besides, when I deploy a normal pod like

apiVersion: v1
kind: Pod
metadata:
  name: normal-simple-test-pod
spec:
  containers:
  - name: appcntr1 
    image: centos/tools 
    imagePullPolicy: IfNotPresent
    command: [ "/bin/bash", "-c", "--" ]
    args: [ "while true; do sleep 300000; done;" ]

it also stucks, and the error message is more clear:

$ kk describe pod normal-simple-test-pod 
Name:         normal-simple-test-pod
Namespace:    default
Priority:     0
Node:         node-1/192.168.1.2
Start Time:   Thu, 01 Apr 2021 08:51:56 -0600
Labels:       <none>
Annotations:  kubectl.kubernetes.io/last-applied-configuration:
                {"apiVersion":"v1","kind":"Pod","metadata":{"annotations":{},"name":"normal-simple-test-pod","namespace":"default"},"spec":{"containers":[...
Status:       Pending
IP:           
IPs:          <none>
Containers:
  appcntr1:
    Container ID:  
    Image:         centos/tools
    Image ID:      
    Port:          <none>
    Host Port:     <none>
    Command:
      /bin/bash
      -c
      --
    Args:
      while true; do sleep 300000; done;
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-4sq6w (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  default-token-4sq6w:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-4sq6w
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason                  Age              From               Message
  ----     ------                  ----             ----               -------
  Normal   Scheduled               12s              default-scheduler  Successfully assigned default/normal-simple-test-pod to node-1
  Warning  FailedCreatePodSandBox  10s              kubelet, node-1    Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "31afbfd72570d2f27480ac324c8761d5343813c42dfd0bb8bd3d0d548be33550" network for pod "normal-simple-test-pod": networkPlugin cni failed to set up pod "normal-simple-test-pod_default" network: there are no network connections defined, and there is no suitable default network configured in the cluster
  Warning  FailedCreatePodSandBox  8s               kubelet, node-1    Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "49a48d8d6c0e60e508b9e0aaed084f5ebdd5dcb556ad9722bc50fed660f05bab" network for pod "normal-simple-test-pod": networkPlugin cni failed to set up pod "normal-simple-test-pod_default" network: there are no network connections defined, and there is no suitable default network configured in the cluster
  Warning  FailedCreatePodSandBox  7s               kubelet, node-1    Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "50d8b22a39b191f494f532b92cca1457a9bae485e86f0fbfd268e682cd729f33" network for pod "normal-simple-test-pod": networkPlugin cni failed to set up pod "normal-simple-test-pod_default" network: there are no network connections defined, and there is no suitable default network configured in the cluster
  Warning  FailedCreatePodSandBox  5s               kubelet, node-1    Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "9841c8d6512e146935e8efd89d958097be068b599e8d16f97150fb0bdc89efd7" network for pod "normal-simple-test-pod": networkPlugin cni failed to set up pod "normal-simple-test-pod_default" network: there are no network connections defined, and there is no suitable default network configured in the cluster
  Warning  FailedCreatePodSandBox  2s               kubelet, node-1    Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "c23002976acd628b49a4e4b9ca76e2835e7166c221a3d830791655735ecc68ec" network for pod "normal-simple-test-pod": networkPlugin cni failed to set up pod "normal-simple-test-pod_default" network: there are no network connections defined, and there is no suitable default network configured in the cluster
  Normal   SandboxChanged          1s (x6 over 9s)  kubelet, node-1    Pod sandbox changed, it will be killed and re-created.
  Warning  FailedCreatePodSandBox  1s               kubelet, node-1    Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "55599c05b825d404d72c58ea0bb759385ac211830409dd51c528adb68ad507c1" network for pod "normal-simple-test-pod": networkPlugin cni failed to set up pod "normal-simple-test-pod_default" network: there are no network connections defined, and there is no suitable default network configured in the cluster

Could you please help me to figure out what the error is actually about and how I can fix it?

What you expected to happen:

Run smoothly.

How to reproduce it:

Anything else we need to know?:

Environment:

Levovar commented 3 years ago

for the first error: the CNI config is invalid according to the latest CNI specification. as the error message explains, it is mandatory to add "name" attribute to a CNI config file you can add it with literally any value, cause nobody is reading it or using it for anything (yet it is mandatory)

for the second error: you don't have a default network configured in your cluster, and you haven't specified any network connections for your Pod so DANM has no idea what do you want to do with this guy :) please read https://github.com/nokia/danm/blob/master/user-guide.md#defining-default-networks

Panlichen commented 3 years ago

Thanks for your reply.

After a lot of work, I am now able to use SRIOV in Kubernetes with sriov-operator, and I have to push forward my project so I may not try to deal with danm recently. However, the philosophy behind danm is quite interesting, I am sure I'll be back.