k8snetworkplumbingwg / multus-cni

A CNI meta-plugin for multi-homed pods in Kubernetes
Apache License 2.0
2.35k stars 585 forks source link

Support cniConf option in multus entrypoint #1118

Closed cyclinder closed 5 months ago

cyclinder commented 1 year ago

What happend:

What you expected to happen:

Pods with default annotations are successfully created

How to reproduce it (as minimally and precisely as possible):

  Warning  FailedCreatePodSandBox  5s    kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "813579cf74e86fe3c9d4841119655e0a1466de20a84a37115ebadb4dec2a20cd": plugin type="multus" name="multus-cni-network" failed (add): Multus: [kube-system/spiderdoctor-agent-mqlrp/671e0744-a17e-4068-ad33-f83d9a857ae5]: error loading k8s delegates k8s args: TryLoadPodDelegates: error in loading K8s cluster default network from pod annotation: tryLoadK8sPodDefaultNetwork: failed getting the delegate: GetCNIConfig: err in GetCNIConfigFromFile: No networks found in /etc/cni/multus/net.d

Anything else we need to know?:

I read the source code and found that when multus automatically generates the cni configuration file, the confDir field of the generated 00-multus.conf is empty.

When a pod is created with default-network annotations and confDir is empty, multus assigns confDir the value to /etc/cni/multus/net.d(see https://github.com/k8snetworkplumbingwg/multus-cni/blob/80c0f6f0c4ed85ab5887e81cb5ee3294995ac93c/pkg/types/conf.go#L352), but the file path does not exist on the host, resulting in no Networks being found.

I think there are two ways to solve it here:

I prefer to option1, How do you thinks?

Environment:

s1061123 commented 1 year ago

Hi, could you please share following items to troubleshooting?

cyclinder commented 1 year ago

Yes, thanks for looking at this.

files in /etc/cni/net.d

root@control-plane:~# ls /etc/cni/net.d/
00-multus.conf  10-calico.conflist  multus.d

files in /etc/cni/multus/net.d

root@control-plane:~# ls /etc/cni/multus/net.d
ls: cannot access '/etc/cni/multus/net.d': No such file or directory

Pod manifest

root@control-plane:~# kubectl get po -n kube-system  spiderdoctor-agent-jqpc8 -o yaml
apiVersion: v1
kind: Pod
metadata:
annotations:
v1.multus-cni.io/default-network: kube-system/k8s-pod-network

kubectl -A get net-attach-def output

root@control-plane:~# kubectl  get net-attach-def -A
NAMESPACE     NAME                       AGE
kube-system   k8s-pod-network            20h

cat 00-multus.conf

root@control-plane:~# cat /etc/cni/net.d/00-multus.conf | jq
{
"cniVersion": "0.3.1",
"name": "multus-cni-network",
"type": "multus",
"kubeconfig": "/etc/cni/net.d/multus.d/multus.kubeconfig",
"delegates": [
{
"name": "k8s-pod-network",
"cniVersion": "0.3.1",
"plugins": [
{
"type": "calico",
"log_level": "info",
"log_file_path": "/var/log/calico/cni/cni.log",
"datastore_type": "kubernetes",
"nodename": "10-20-1-230",
"mtu": 0,
"ipam": {
"type": "calico-ipam"
},
"policy": {
"type": "k8s"
},
"kubernetes": {
"kubeconfig": "/etc/cni/net.d/calico-kubeconfig"
}
},
{
"type": "portmap",
"snat": true,
"capabilities": {"portMappings": true}
},
{
"type": "bandwidth",
"capabilities": {"bandwidth": true}
}
]
}
]
}

There are two workarounds:

{
    "cniVersion": "0.3.1",
    "name": "multus-cni-network",
    "type": "multus",
    "kubeconfig": "/etc/cni/net.d/multus.d/multus.kubeconfig",
    "confDir": "/etc/cni/net.d/",
    ....
}
# every node
mkdir -p /etc/cni/multus/net.d
cp /etc/cni/net.d/*.conf* /etc/cni/multus/net.d

I think option 1 is better, But If Mults restarts, this change is lost. So I opened https://github.com/k8snetworkplumbingwg/multus-cni/pull/1119 for it

cyclinder commented 1 year ago

But note that this issue was found in multus v3.9*, I'm not sure if 4.0 still has this issue. So the PR1119 I've opened is only merged to v3.9.

s1061123 commented 1 year ago

Thank you for the info. In addition,

cyclinder commented 1 year ago
  • could you please share kubectl get pod -A -o yaml?
root@control-plane:~# kubectl get po -n kube-system  spiderdoctor-agent-jqpc8 -o yaml
apiVersion: v1
kind: Pod
metadata:
  annotations:
    v1.multus-cni.io/default-network: kube-system/k8s-pod-network
spec:
....
  • please let me know why you want to use v1.multus-cni.io/default-network?

In our cluster, we achieve the coexistence of multiple CNIs via v1.multus-cni.io/default-network.

s1061123 commented 1 year ago

please let me know why you want to use v1.multus-cni.io/default-network?

In our cluster, we achieve the coexistence of multiple CNIs via v1.multus-cni.io/default-network.

I'm not clear what you mean. Multiple CNI could be archieved by k8s.v1.cni.cncf.io/networks annotations as usual. Please explain why you need v1.multus-cni.io/default-network instead of k8s.v1.cni.cncf.io/networks.

s1061123 commented 1 year ago

At least, as far as I test with multus v3.9.3, following yaml works fine. Note: I also test with v1.multus-cni.io/default-network: macvlan-conf-1 and it also works.

# this is just a test to launch one pod, so static IPAM is used.
---
apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
  name: macvlan-conf-1
  namespace: kube-system
spec:
  config: '{
            "cniVersion": "0.3.1",
            "type": "macvlan",
            "capabilities": { "ips": true },
            "master": "eth1",
            "mode": "bridge",
            "ipam": {
                "type": "static",
                "addresses": [ {
                    "address": "10.1.1.101/24"
                } ]
            }
        }'
---
apiVersion: v1
kind: Pod
metadata:
  name: fedora
  annotations:
    v1.multus-cni.io/default-network: kube-system/macvlan-conf-1
spec:
  containers:
  - name: fedora
    image: quay.io/s1061123/fedora-tools:38
    command:
    - /sbin/init
cyclinder commented 1 year ago

There are some different types of CNI in our cluster, such as calico, macvlan, ipvlan, sriov, etc., and we use annotations: v1.multus-cni.io/defaylt-network choose different default network for pods.

and calico is the default network for the cluster, the calico configuration file exists in /etc/cni/net.d, and when you create a NAD for it, but spec is nil, the issue arises. Because the default value of confDir is /etc/cni/multus/net.d.

Function call stack:

https://github.com/k8snetworkplumbingwg/multus-cni/blob/80c0f6f0c4ed85ab5887e81cb5ee3294995ac93c/pkg/k8sclient/k8sclient.go#L288

===>

https://github.com/k8snetworkplumbingwg/multus-cni/blob/80c0f6f0c4ed85ab5887e81cb5ee3294995ac93c/vendor/github.com/k8snetworkplumbingwg/network-attachment-definition-client/pkg/utils/cniconfig.go#L42

===>

https://github.com/k8snetworkplumbingwg/multus-cni/blob/80c0f6f0c4ed85ab5887e81cb5ee3294995ac93c/vendor/github.com/k8snetworkplumbingwg/network-attachment-definition-client/pkg/utils/cniconfig.go#L71

s1061123 commented 1 year ago

In case of reference file in net-attach-def, we use different directory, /etc/cni/multus/net.d, instead of /etc/cni/net.d. This is by design to override Kubernetes CNI config (let's imagine calico or other plugin added 00-calico.yaml in /etc/cni/net.d). So multus expects users to use another directory for multus CNI directory, instead of container runtime CNI directory.

But as I mentioned, we missing to give option in entrypoint.sh/multus-daemon to change that. I gree.

cyclinder commented 1 year ago

This is by design to override Kubernetes CNI config (let's imagine calico or other plugin added 00-calico.yaml in /etc/cni/net.d). So multus expects users to use another directory for multus CNI directory, instead of container runtime CNI directory.

/etc/cni/multus/net.d is only for multus CNI directory(00-multus.conf)? , If so, why do we look for calico or some other default network CNI file in this directly when we use annotations: v1.multus-cni.io/defaylt-network?

s1061123 commented 1 year ago

No. I mean that /etc/cni/net.d is for container runtime's CNI directry where 00-multus.conf is. /etc/cni/multus/net.d is for multus CNI directory where other CNI that is used by multus as net-attach-def without config.

BTW, we clarify that this issue seems not that pod with annotations: v1.multus-cni.io/default-network fail to be created and the issue seems to address 'add cniConf as multus entrypoint's parameter'. So let me change the issue title.

cyclinder commented 1 year ago

got it, thanks for the details.

github-actions[bot] commented 11 months ago

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 7 days.

cyclinder commented 11 months ago

Hi @s1061123 Can you please take a look?

github-actions[bot] commented 8 months ago

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 7 days.

cyclinder commented 8 months ago

/remove stale

github-actions[bot] commented 5 months ago

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 7 days.