nokia / danm

TelCo grade network management in a Kubernetes cluster
BSD 3-Clause "New" or "Revised" License
373 stars 81 forks source link

failed to get Pod info from K8s API server due to:Unauthorized #249

Closed Panlichen closed 3 years ago

Panlichen commented 3 years ago

Is this a BUG REPORT or FEATURE REQUEST?:

bug

What happened:

After install danm successfully, as shown:

$ kubectl get pod -n kube-system 
NAME                                      READY   STATUS      RESTARTS   AGE
coredns-6955765f44-ft7sz                  1/1     Running     0          171m
coredns-6955765f44-rgfxf                  1/1     Running     0          171m
danm-cni-p2dz9                            1/1     Running     0          114m
danm-cni-t5x44                            1/1     Running     0          114m
danm-installer-z45rj                      0/1     Completed   0          114m
danm-webhook-deployment-8dddc66b4-kj5g5   1/1     Running     0          114m
etcd-node-0                               1/1     Running     0          171m
kube-apiserver-node-0                     1/1     Running     0          171m
kube-controller-manager-node-0            1/1     Running     0          171m
kube-flannel-ds-pm8lj                     1/1     Running     0          163m
kube-flannel-ds-zmfpn                     1/1     Running     0          163m
kube-proxy-4vlpr                          1/1     Running     0          171m
kube-proxy-nwg8k                          1/1     Running     0          163m
kube-scheduler-node-0                     1/1     Running     0          171m
netwatcher-pj27d                          1/1     Running     0          114m
netwatcher-z8d8x                          1/1     Running     0          114m
svcwatcher-94nd9                          1/1     Running     0          114m

I deploy a simple danm network:

apiVersion: danm.k8s.io/v1
kind: DanmNet
metadata:
  name: management
spec:
  #Change this to match the CNI config file name in your environment
  NetworkID: flannel
  NetworkType: flannel

and simple pod:

apiVersion: v1
kind: Pod
metadata:
  name: simple-test-pod
  annotations:
    danm.k8s.io/interfaces: |
      [
        {"network":"management", "ip":"dynamic"}
      ]
spec:
  containers:
  - name: appcntr1 
    image: centos/tools 
    imagePullPolicy: IfNotPresent
    command: [ "/bin/bash", "-c", "--" ]
    args: [ "while true; do sleep 300000; done;" ]

the net seems good:

$ kk get dnet
NAME         AGE
management   103m

but the pod gets stuck:

$ kubectl describe pod simple-test-pod
Name:         simple-test-pod
Namespace:    default
Priority:     0
Node:         node-1/192.168.1.2
Start Time:   Thu, 01 Apr 2021 00:20:30 -0600
Labels:       <none>
Annotations:  danm.k8s.io/interfaces:
                [
                  {"network":"management", "ip":"dynamic"}
                ]
              kubectl.kubernetes.io/last-applied-configuration:
                {"apiVersion":"v1","kind":"Pod","metadata":{"annotations":{"danm.k8s.io/interfaces":"[\n  {\"network\":\"management\", \"ip\":\"dynamic\"}...
Status:       Pending
IP:           
IPs:          <none>
Containers:
  appcntr1:
    Container ID:  
    Image:         centos/tools
    Image ID:      
    Port:          <none>
    Host Port:     <none>
    Command:
      /bin/bash
      -c
      --
    Args:
      while true; do sleep 300000; done;
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-lkb9x (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  default-token-lkb9x:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-lkb9x
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason                  Age                      From             Message
  ----     ------                  ----                     ----             -------
  Normal   SandboxChanged          14m (x1284 over 104m)    kubelet, node-1  Pod sandbox changed, it will be killed and re-created.
  Warning  FailedCreatePodSandBox  4m42s (x1422 over 104m)  kubelet, node-1  (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "18458bd5b06295b31239629c4bf5d71156731a8a4f9d5e58ca835f0c46259f12" network for pod "simple-test-pod": networkPlugin cni failed to set up pod "simple-test-pod_default" network: Pod manifest could not be parsed with error: failed to get Pod info from K8s API server due to:Unauthorized

Stuff in /etc/cni/net.d is:

$ ll /etc/cni/net.d
total 16K
-rw-r--r-- 1 root root  175 Apr  1 00:08 00-danm.conf
-rw-r--r-- 1 root root  292 Apr  1 00:08 10-flannel.conflist
-rw------- 1 root root 2.9K Apr  1 00:08 danm-kubeconfig
-rw-r--r-- 1 root root  123 Apr  1 00:08 flannel.conf
$ sudo cat /etc/cni/net.d/danm-kubeconfig
---
apiVersion: v1
kind: Config
current-context: default
clusters:
  - cluster:
      certificate-authority-data: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUN5RENDQWJDZ0F3SUJBZ0lCQURBTkJna3Foa2lHOXcwQkFRc0ZBREFWTVJNd0VRWURWUVFERXdwcmRXSmwKY201bGRHVnpNQjRYRFRJeE1EUXdNVEExTVRBMU1sb1hEVE14TURNek1EQTFNVEExTWxvd0ZURVRNQkVHQTFVRQpBeE1LYTNWaVpYSnVaWFJsY3pDQ0FTSXdEUVlKS29aSWh2Y05BUUVCQlFBRGdnRVBBRENDQVFvQ2dnRUJBTWszClp0UHRUbzhlUHFhNFJhdDBTbXpPYmx0dUhtdC9FcVlaYWQ1YlppODFQRFYyQnh2V1hJSi8zdXdKWE9USGRkaGoKMWJHQ1J6YmxwZjZ1K0tBSXZFdjl5c2Y4V2JTbGt3L3hHVGY2ckFPcUFkNnJ3V3g5a3NERTFyVE01L0hmWllHaQpDYjYyRU5tYzlJMkphWCtIYVpMaDQ1WDRiYUlxWTUrQmgvTHRkVTd2ekEvSXU5dHNmTWhxblU1V0ZsTC9JS1EwCm9JU2NLTitQWmROdXJyRWNnZTRubU5IbDQzNFQ3a1BiN0N0Wmdkb3Nna3B6TzdhdWRIenJDRHpPbnk0MnlIV2gKbHNmVHRQeFZ6ODZjTExvQWdKNTRrNVlYMFZaeUE4Q0lHdFJ3U0xRakFZQkpyZW13Tys5c3RybFN5VHR4ZEZrdwoyWDJ2cDlLWjMyU3U3STduRTZNQ0F3RUFBYU1qTUNFd0RnWURWUjBQQVFIL0JBUURBZ0trTUE4R0ExVWRFd0VCCi93UUZNQU1CQWY4d0RRWUpLb1pJaHZjTkFRRUxCUUFEZ2dFQkFEY2Rmajc4b2l3clhPTUJkeTB6RWJpNGt0K0cKMldBdVlpQWNNUk9pQ1RlQi96TngvQkJubXljL01HQkxkYS9SUC9zYVdaT1lWK3B5MER1UHlORTdKWDFBV0J5dAp6aGdKc0Z6Tk1LZzRVeGZXRWpCOGN5NkVxbmhaOCtWL2ZieG5FWVdLZDNuWGFaNHh2aXY4KzlOZ2pmeUZPTU5ECm9qajluNTl2TEljZU80U0ZvbndqWEN5dU9tRWZOdXJrTHV4REp3VDdlWW5hdGluVFRYb1lMTG5ETmpiZ0pDdjMKY01VRHdxMnJObjN0ZnBBNVM2cDVmVUhMc0xIQTJJVElEd2dxdFB3VVRhUi9hWHFRNFlVNTRJRUErbU5PVHF5Kwp2dldaS3pYZHFFSjV2TlVIQWg3SkVlb29MZFhVWlo1UmRKbUxZQ0xWbnlGNWUwamplcU5ReDRPY1JOYz0KLS0tLS1FTkQgQ0VSVElGSUNBVEUtLS0tLQo=
      server: https://192.168.1.1:6443
    name: kubernetes
contexts:
  - context:
      cluster: kubernetes
      user: danm
    name: default
users:
  - name: danm
    user:
      token: ZXlKaGJHY2lPaUpTVXpJMU5pSXNJbXRwWkNJNkltSkliWGszV0dkNWFGY3diVEZtWlhsVVdFRkRlRFEyYTBkdGJWRlhRUzFMYUZsWWVYRmlUVmw2UkRnaWZRLmV5SnBjM01pT2lKcmRXSmxjbTVsZEdWekwzTmxjblpwWTJWaFkyTnZkVzUwSWl3aWEzVmlaWEp1WlhSbGN5NXBieTl6WlhKMmFXTmxZV05qYjNWdWRDOXVZVzFsYzNCaFkyVWlPaUpyZFdKbExYTjVjM1JsYlNJc0ltdDFZbVZ5Ym1WMFpYTXVhVzh2YzJWeWRtbGpaV0ZqWTI5MWJuUXZjMlZqY21WMExtNWhiV1VpT2lKa1lXNXRMWFJ2YTJWdUxXSnlZM0p0SWl3aWEzVmlaWEp1WlhSbGN5NXBieTl6WlhKMmFXTmxZV05qYjNWdWRDOXpaWEoyYVdObExXRmpZMjkxYm5RdWJtRnRaU0k2SW1SaGJtMGlMQ0pyZFdKbGNtNWxkR1Z6TG1sdkwzTmxjblpwWTJWaFkyTnZkVzUwTDNObGNuWnBZMlV0WVdOamIzVnVkQzUxYVdRaU9pSmlZamxsTmpCak5TMHdOMkkyTFRSak5USXRPR05sTUMxallXTTFNR1E0TkRrMVpqUWlMQ0p6ZFdJaU9pSnplWE4wWlcwNmMyVnlkbWxqWldGalkyOTFiblE2YTNWaVpTMXplWE4wWlcwNlpHRnViU0o5LkVmNFFhazRROENQd2RBcHV1MkdnOTdsWG91Y0dRd3ZsTFBrQU9zVVRuV2tyQnU5c3IzTHp6M0tJdzNub3RGMEtsbm4tMTdSamd5U0FQOG5lTUxlNTVaeWY2U295OGhJUjB6SEVaaG9YOTBZQjlZaXVKaWYybGdBZ1ZxWDNUTWJXQ0twanJKbDFBTm1vR1JPcGZoZDg0UG1ZY3c5NzFBU0RLQUplYTVKSEw4cGdpSS11NkxkUmdyTUtSaTFFaXNjaEY2YXQwOUdZM3REWk1EV3ZHLTF5MjlDQzZJS2kwNU5KbXBiSUVpU1dwWWpndHZ5NGVuVUlHLXN4Y0hkQlJnd29ORVBjaGZLMm5YQ3pWcUtwS3RrWkhCOUplMExjQ28tdm1uYjY5c28tUGhONU9XZ21IbzRYa1h0ZmlqellwN0k0SkdRNWx1X2d5S3czalNDbjY0SVA0QQ==
preferences: {}

$ sudo cat /etc/cni/net.d/00-danm.conf   
{
  "cniVersion": "0.3.1",
  "name": "danm_meta_cni",
  "type": "danm",
  "kubeconfig": "/etc/cni/net.d/danm-kubeconfig",
  "cniDir": "/etc/cni/net.d",
  "namingScheme": ""
}

$ sudo cat /etc/cni/net.d/flannel.conf 
{
  "cniVersion": "0.3.1",
  "type": "flannel",
  "delegate": {
    "hairpinMode": true,
    "isDefaultGateway": true
  }
}%    

I use danm-installer, in danm-installer-config.ymal, I use

  default_cni_type: flannel

  default_cni_network_id: flannel

I also change something in install.sh and other files to fix some error I meet during the installation, which I push to my forked repo.

What you expected to happen:

Run smoothly.

How to reproduce it:

Anything else we need to know?:

Environment:

Panlichen commented 3 years ago

I use kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml to install flannel after kubeadm init and kubeadm join, so there is a 10-flannel.conflist in /etc/cni/net.d. I do not know much about CNI, is this kind of related to the problem?