projectcalico / calico

Cloud native networking and network security
https://docs.tigera.io/calico/latest/about/
Apache License 2.0
5.9k stars 1.31k forks source link

Containerd prompt error: failed to destroy network for sandbox \"***\": plugin type=\"calico\" failed (delete): netplugin failed with no error message: signal: killed" #7647

Closed cui3093 closed 5 months ago

cui3093 commented 1 year ago

When deploying coreDNS, the pod hangs in ContainerCreating forever.

Expected Behavior

Pod coreDNS should in state Running.

Current Behavior

Possible Solution

Steps to Reproduce (for bugs)

1. 2. 3. 4.

Context

apiVersion: policy/v1 kind: PodDisruptionBudget metadata: name: calico-kube-controllers namespace: kube-system labels: k8s-app: calico-kube-controllers spec: maxUnavailable: 1 selector: matchLabels: k8s-app: calico-kube-controllers

Source: calico/templates/calico-kube-controllers.yaml

apiVersion: v1 kind: ServiceAccount metadata: name: calico-kube-controllers namespace: kube-system

Source: calico/templates/calico-node.yaml

apiVersion: v1 kind: ServiceAccount metadata: name: calico-node namespace: kube-system

Source: calico/templates/calico-etcd-secrets.yaml

The following contains k8s Secrets for use with a TLS enabled etcd cluster.

For information on populating Secrets, see http://kubernetes.io/docs/user-guide/secrets/

apiVersion: v1 kind: Secret type: Opaque metadata: name: calico-etcd-secrets namespace: kube-system data:

Populate the following with etcd TLS configuration if desired, but leave blank if

not using TLS for etcd.

The keys below should be uncommented and the values populated with the base64

encoded contents of each file that would be associated with the TLS data.

Example command for encoding a file contents: cat | base64 -w 0

etcd-cert: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUNSVENDQWVxZ0F3SUJBZ0lVSGZySmZEM3NNUk9PVGhuajNIaU52WHpkYkJvd0NnWUlLb1pJemowRUF3SXcKRHpFTk1Bc0dBMVVFQXhNRVpYUmpaREFlRncweU16QTFNRFV3TlRRMU1EQmFGdzB6TXpBMU1ESXdOVFExTURCYQpNR1V4Q3pBSkJnTlZCQVlUQWtOT01SQXdEZ1lEVlFRSUV3ZENaV2xLYVc1bk1SQXdEZ1lEVlFRSEV3ZENaV2xLCmFXNW5NUTB3Q3dZRFZRUUtFd1JGZEdOa01RMHdDd1lEVlFRTEV3UkZkR05rTVJRd0VnWURWUVFERXd0bGRHTmsKTFhObGNuWmxjakJaTUJNR0J5cUdTTTQ5QWdFR0NDcUdTTTQ5QXdFSEEwSUFCRlcxRzkyMGxGMmErR0hOTk9uZApMdWdCZVdiM01OemV5UDhoaDl1ZnZBSHZwRXpUUHp0SWcrVHdtejVkWEJnSWpOOE93djJNWjdTdVVxNEoxSFZUClF5Q2pnYzB3Z2Nvd0RnWURWUjBQQVFIL0JBUURBZ1dnTUIwR0ExVWRKUVFXTUJRR0NDc0dBUVVGQndNQkJnZ3IKQmdFRkJRY0RBakFNQmdOVkhSTUJBZjhFQWpBQU1CMEdBMVVkRGdRV0JCUnVrQ1oxNG80cXY4UjlpRmFyTHkzQwovMVFUM1RBZkJnTlZIU01FR0RBV2dCVFNxVkFVWFRSMmllVElEMkN2WVBhcnlhQVYzakJMQmdOVkhSRUVSREJDCmdneHJPSE10YldGemRHVnlNREdDREdzNGN5MXRZWE4wWlhJd01vSU1hemh6TFcxaGMzUmxjakF6aHdSL0FBQUIKaHdUQXFEZzlod1RBcURnK2h3VEFxRGcvTUFvR0NDcUdTTTQ5QkFNQ0Ewa0FNRVlDSVFEQk5ueFJkbmp1T0ZCeApCdHRhdEp4Y3ZNcnA1NWxDUVBVdWdZdnRXYmJMSWdJaEFOU29ud1RFYkNTbVN1WjA5U2pUb09GWjZVSmxUbEJLCnN5b2lJdFR4VXJ4ZQotLS0tLUVORCBDRVJUSUZJQ0FURS0tLS0tCg== etcd-key: LS0tLS1CRUdJTiBFQyBQUklWQVRFIEtFWS0tLS0tCk1IY0NBUUVFSUIzd2xhQUVvVkJlbnl5R1Q5aTZTQzRDdkVET2cwb3lkb2Z4dlZIckZ3TkZvQW9HQ0NxR1NNNDkKQXdFSG9VUURRZ0FFVmJVYjNiU1VYWnI0WWMwMDZkMHU2QUY1WnZjdzNON0kveUdIMjUrOEFlK2tUTk0vTzBpRAo1UENiUGwxY0dBaU0zdzdDL1l4bnRLNVNyZ25VZFZORElBPT0KLS0tLS1FTkQgRUMgUFJJVkFURSBLRVktLS0tLQo= etcd-ca: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUJZVENDQVFpZ0F3SUJBZ0lVZHhKVSs2MVV4Z3dEUjFmQ0xSUytnQk93Sndzd0NnWUlLb1pJemowRUF3SXcKRHpFTk1Bc0dBMVVFQXhNRVpYUmpaREFlRncweU16QTFNRFV3TlRRMU1EQmFGdzB6TXpBMU1ESXdOVFExTURCYQpNQTh4RFRBTEJnTlZCQU1UQkdWMFkyUXdXVEFUQmdjcWhrak9QUUlCQmdncWhrak9QUU1CQndOQ0FBUWwwdlNGCjEvM1k3UllQSFN6bkhOYlcyNzVPdDNpOXg1R0hocVNPekd5bnBNYUw0akwzMDdObCtqcjM1anZQMStOaG1SUVUKbXFlLzk5bXZBRDB3clAyT28wSXdRREFPQmdOVkhROEJBZjhFQkFNQ0FRWXdEd1lEVlIwVEFRSC9CQVV3QXdFQgovekFkQmdOVkhRNEVGZ1FVMHFsUUZGMDBkb25reUE5Z3IyRDJxOG1nRmQ0d0NnWUlLb1pJemowRUF3SURSd0F3ClJBSWdhYmhGV0QvSnE5QXprbUloTklYK2RoVTNBWXdObXpIMnJ2VVNpeDMwZkc4Q0lEMG5KbmhqVCswbk5oanUKM0w0V0NiQUsrY0I3dzZydUN0NEZoaTg0blQvdAotLS0tLUVORCBDRVJUSUZJQ0FURS0tLS0tCg==

Source: calico/templates/calico-config.yaml

This ConfigMap is used to configure a self-hosted Calico installation.

kind: ConfigMap apiVersion: v1 metadata: name: calico-config namespace: kube-system data:

Configure this with the location of your etcd cluster.

etcd_endpoints: "https://192.168.56.61:2379"

If you're using TLS enabled etcd uncomment the following.

You must also populate the Secret below with these files.

etcd_ca: "/calico-secrets/etcd-ca" # "/calico-secrets/etcd-ca" etcd_cert: "/calico-secrets/etcd-cert" # "/calico-secrets/etcd-cert" etcd_key: "/calico-secrets/etcd-key" # "/calico-secrets/etcd-key"

Typha is disabled.

typha_service_name: "none"

Configure the backend to use.

calico_backend: "bird"

calico_backend: "vxlan"

Configure the MTU to use for workload interfaces and tunnels.

By default, MTU is auto-detected, and explicitly setting this field should not be required.

You can override auto-detection by providing a non-zero value.

veth_mtu: "0"

The CNI network configuration to install on each node. The special

values in this config will be automatically populated.

cni_network_config: |- { "name": "k8s-pod-network", "cniVersion": "0.3.1", "plugins": [ { "type": "calico", "log_level": "info", "log_file_path": "/var/log/calico/cni/cni.log", "etcd_endpoints": "ETCD_ENDPOINTS", "etcd_key_file": "ETCD_KEY_FILE", "etcd_cert_file": "ETCD_CERT_FILE__", "etcd_ca_cert_file": "ETCD_CA_CERT_FILE", "mtu": CNI_MTU, "ipam": { "type": "calico-ipam" }, "policy": { "type": "k8s" }, "kubernetes": { "kubeconfig": "KUBECONFIG_FILEPATH__" } }, { "type": "portmap", "snat": true, "capabilities": {"portMappings": true} }, { "type": "bandwidth", "capabilities": {"bandwidth": true} } ] }

Source: calico/templates/calico-kube-controllers-rbac.yaml

Include a clusterrole for the kube-controllers component,

and bind it to the calico-kube-controllers serviceaccount.

kind: ClusterRole apiVersion: rbac.authorization.k8s.io/v1 metadata: name: calico-kube-controllers rules:

Pods are monitored for changing labels.

The node controller monitors Kubernetes nodes.

Namespace and serviceaccount labels are used for policy.

Your Environment

ALMALINUX_MANTISBT_PROJECT="AlmaLinux-9" ALMALINUX_MANTISBT_PROJECT_VERSION="9.1" REDHAT_SUPPORT_PRODUCT="AlmaLinux" REDHAT_SUPPORT_PRODUCT_VERSION="9.1"


* Link to your project (optional):
cui3093 commented 1 year ago

All contents exceeded the limit of 65535 bytes, so I seperately added:

cui3093 commented 1 year ago
cui3093 commented 1 year ago
cui3093 commented 1 year ago
cui3093 commented 1 year ago

Maybe it's not a bug, but I don't know how to trace the root cause.

cui3093 commented 1 year ago

Can anyone take a look at this issue?

caseydavenport commented 1 year ago

The logs we need for this will be from the CNI plugin not from calico-node or kube-controllers. Those are generally found in /var/log/calico/cni on the host machine.

cui3093 commented 1 year ago

It's empty of directory /var/log/calico/cni.

caseydavenport commented 1 year ago

Does your CNI configuration at /etc/cni/net.d/10-calico.conflist include a log directory setting?

igor-loncarevic commented 1 year ago

@caseydavenport

The /var/log/calico/cni directory is empty because of the calico-node pod specification. This can be observed in the manifest for all recent versions of Calico.

 - mountPath: /var/log/calico/cni
      name: cni-log-dir
      readOnly: true

Directory was mounted read-only in pod:

[calico-node-pod /]# grep /var/log/calico /proc/mounts 
/dev/mapper/centos-root /var/log/calico/cni xfs ro,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota 0 0

Below are the logs related to accessing the read-only (ro) directory "/var/log/calico/cni":

touch: cannot touch '/var/log/calico/cni/config': Read-only file system
./run: line 5: /var/log/calico/cni/config: Read-only file system
./run: line 6: /var/log/calico/cni/config: Read-only file system
svlogd: warning: unable to lock directory: /var/log/calico/cni: read-only file system
svlogd: fatal: no functional log directories.
touch: cannot touch '/var/log/calico/cni/config': Read-only file system
./run: line 5: /var/log/calico/cni/config: Read-only file system
./run: line 6: /var/log/calico/cni/config: Read-only file system
svlogd: warning: unable to lock directory: /var/log/calico/cni: read-only file system
svlogd: fatal: no functional log directories.
touch: cannot touch '/var/log/calico/cni/config': Read-only file system
./run: line 5: /var/log/calico/cni/config: Read-only file system
./run: line 6: /var/log/calico/cni/config: Read-only file system
svlogd: warning: unable to lock directory: /var/log/calico/cni: read-only file system
svlogd: fatal: no functional log directories.
caseydavenport commented 1 year ago

directory is empty because of the calico-node pod specification. This can be observed in the manifest for all recent versions of Calico.

I'm not sure why that directory is mounted into calico/node at all to be honest.

The logs that I am referring to aren't written by calico/node, they are written by the Calico CNI plugin which is executed on the host and doesn't use a volume mount to access that directory.

igor-loncarevic commented 1 year ago

In this scenario, there are two potential issues to address concerning logging:

  1. There is a lack of log entries generated by the Calico CNI plugin running on the host.
  2. The purpose behind the calico-node mounting of /var/log/calico/cni.

While here, after removing the readOnly: true parameter from the manifest/pod specification, the following changes are observed on the host side:

host # ll /var/log/calico/cni/
total 28
-rw-r--r-- 1 root root 691 Jun  1 16:20 @400000006478c8672dd47d94.u
-rw-r--r-- 1 root root 693 Jun  1 16:33 @400000006478cac206ba4c34.u
-rw-r--r-- 1 root root 691 Jun  1 16:43 @400000006478cd143104f534.u
-rw-r--r-- 1 root root 693 Jun  1 16:53 @400000006478cf2a1745b9bc.u
-rw-r--r-- 1 root root 693 Jun  1 17:02 @400000006478d22434f47afc.u
-rw-r--r-- 1 root root 117 Jun  1 17:15 config
-rw-r--r-- 1 root root 693 Jun  1 17:15 current
-rw------- 1 root root   0 Jun  1 12:27 lock
jiaxzeng commented 10 months ago

I also encountered the same problem, is there any solution?

caseydavenport commented 9 months ago

We'll need logs from the CNI plugin (if it's emitting them) to find out why this is occurring

netplugin failed with no error message: signal: killed"

This suggests that something is killing the CNI plugin before it can return a response - looking into kernel logs or any processes running on the cluster that might be interfering with the execution of a privileged binary (e.g., seccomp) would be another avenue to explore.

j13tw commented 9 months ago

Did you setup the proxy with containerd ? (or you didnt set the "no_proxy" on containerd service) when my company need use proxy to connect Internet with "http_proxy", I meet this issue. I think that is containerd create new pod also using https_prxoy, but it didnt bypass the service-network-cidr / pod-network-cidr, and it use the https_proxy to connect kube-apiserver. I think that is ridiculous with me, anyway i wasted 2 days to resolve this thing.

tbernacchi commented 3 months ago

@j13tw what did you do to fix? Could you share with us?