nokia / danm

TelCo grade network management in a Kubernetes cluster
BSD 3-Clause "New" or "Revised" License
373 stars 81 forks source link

DANM support in starlingx #236

Closed ankush06 closed 3 years ago

ankush06 commented 4 years ago

Is this a BUG REPORT or FEATURE REQUEST?:

feature

What happened: Installed DANM in starlingx setup. But after the installation some pods in controller-0 are not initializing. The starlingx is using the Calico-CNI. Is there any DANM known issue or any pointer to use DANM in starlingx ?

kubectl get pods -n kube-system calico-kube-controllers-5cd4695574-mq6s7 1/1 Running 0 4d12h 172.16.166.153 controller-1 calico-node-4vgkm 1/1 Running 11 10d 192.168.22.102 controller-0 calico-node-chrxh 1/1 Running 25 10d 192.168.22.103 controller-1 coredns-78d9fd7cb9-9k646 0/1 ContainerCreating 0 3d19h controller-0 rbd-provisioner-77bfb6dbb-zhstx 0/1 ContainerCreating 0 5m42s controller-0 svcwatcher-852tw 0/1 ContainerCreating 0 3d22h controller-0 svcwatcher-flllq 1/1 Running

kubectl describe pod coredns-78d9fd7cb9-9k646 -n kube-system
`FailedCreatePodSandBox 4m27s (x7520 over 3d19h) kubelet, controller-0 (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "8d10a7d8d9c45441655d96c4f5c55cc0fe36512304fbb5e0b691a64e01f10013": CNI network could not be set up: CNI operation timed-out after 3000 seconds

danm.log 2020/09/03 17:53:11.814963 ERROR: ADD: CNI network could not be set up with error:CNI operation for network:default failed with:CNI delegation failed due to error:Error delegating ADD to CNI plugin:calico because:OS exec call failed:incompatible CNI versions; config is "0.1.0", plugin supports ["0.3.1"]

What you expected to happen: Pods should be deployed in both the the nodes

How to reproduce it: Install DANM in starlingx (Calico-CNI)

Environment:

`### K8s CRD DanmNet API schema description ### apiVersion: danm.k8s.io/v1 kind: ClusterNetwork metadata: name: datanwipv6 namespace: default spec: NetworkID: datanwipv6 NetworkType: ipvlan Options: host_device: enp59s0f0 container_prefix: datapathv6 cidr: 10.211.12.0/24 net6: 2001:db8:aaaa:bbbb::/64 allocation_pool_v6: start: 2001:db8:aaaa:bbbb:0:0:0:0001 end: 2001:db8:aaaa:bbbb:0:0:0:000a rt_tables: 214 vlan: 902

Levovar commented 4 years ago

We use Calico with DANM in production, so it is probably just improper configuration. Please share your Calico config. Do you have CNI version -which is a mandatory field according to the spec- defined?

ankush06 commented 4 years ago

Thanks for replying on this. yes looks a config issue only but not able to debug what has gone wrong here. i can see the applications are running in other-node (Controller-1), but failing in Controller-0. Below is the Calico config on both the nodes.

Controller-0

{ "name": "k8s-pod-network", "cniVersion": null, "type": "calico", "log_level": "info", "datastore_type": "kubernetes", "nodename": "controller-0", "mtu": 1440, "ipam": { "type": "calico-ipam", "assign_ipv4": "true", "assign_ipv6": "false" }, "policy": { "type": "k8s" }, "kubernetes": { "kubeconfig": "/etc/cni/net.d/calico-kubeconfig" } }

Controller-1

{ "name": "k8s-pod-network", "cniVersion": null, "type": "calico", "log_level": "info", "datastore_type": "kubernetes", "nodename": "controller-1", "mtu": 1440, "ipam": { "type": "calico-ipam", "assign_ipv4": "true", "assign_ipv6": "false" }, "policy": { "type": "k8s" }, "kubernetes": { "kubeconfig": "/etc/cni/net.d/calico-kubeconfig" } }

Levovar commented 4 years ago

it is interesting how they are running on Controller-1, but the CNI version should be definitely filled :)

try adding "0.3.1" into the config on both nodes, and see what happens

Levovar commented 4 years ago

@ankush06 any updates?

ankush06 commented 3 years ago

Thanks @Levovar . It is working. Sorry for late response