Closed jjhursey closed 5 years ago
Hi @jjhursey,
Singularity-CRI uses CNI configs located in /etc/cni/net.d
by default (can be changed in config file). Make sure your default CNI config (which is the first in the config dir) has loopback
plugin as the first one, e.g.
{
"cniVersion": "0.4.0",
"name": "bridge",
"plugins": [
{
"type": "loopback"
},
{
"type": "bridge",
"bridge": "sbr0",
"isGateway": true,
"ipMasq": true,
"ipam": {
"type": "host-local",
"subnet": "10.22.0.0/16",
"routes": [
{ "dst": "0.0.0.0/0" }
]
}
},
{
"type": "firewall"
},
{
"type": "portmap",
"capabilities": {"portMappings": true},
"snat": true
}
]
}
LMK if that works for you.
The worker nodes have the configuration file pushed to them from the ICP/k8s management node when the kubelet starts up. The configuration being pushed is below:
shell$ cat /etc/cni/net.d/10-calico.conflist
{
"name": "k8s-pod-network",
"cniVersion": "0.3.0",
"plugins": [
{
"type": "calico",
"etcd_endpoints": "https://9.1.2.3:4001",
"etcd_key_file": "/etc/cni/net.d/calico-tls/etcd-key",
"etcd_cert_file": "/etc/cni/net.d/calico-tls/etcd-cert",
"etcd_ca_cert_file": "/etc/cni/net.d/calico-tls/etcd-ca",
"mtu": 1430,
"log_level": "info",
"ipam": {
"type": "calico-ipam"
},
"policy": {
"type": "k8s"
},
"kubernetes": {
"kubeconfig": "/etc/cni/net.d/calico-kubeconfig"
}
},
{
"type": "portmap",
"snat": true,
"capabilities": {"portMappings": true}
}
]
}
Adding the loopback to the top of the plugins list in 10-calico.conflist
works for the next restart of sycri/kubelet - allowing lo
to be mounted in new pods on that host:
{
"name": "k8s-pod-network",
"cniVersion": "0.3.0",
"plugins": [
{
"type": "loopback"
},
{
"type": "calico",
...
But that file is then overwritten when the kubelet starts up. I assume that since sycri has already read the original version then that's why it is working correctly.
So that does work, see below.
shell$ kubectl exec test-sycri-v66gl ifconfig
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1430
inet 10.9.106.240 netmask 255.255.255.255 broadcast 0.0.0.0
inet6 fe80::1055:e5ff:fe48:ed5a prefixlen 64 scopeid 0x20<link>
ether 12:55:e5:48:ed:5a txqueuelen 0 (Ethernet)
RX packets 8 bytes 656 (656.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 8 bytes 656 (656.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
inet6 ::1 prefixlen 128 scopeid 0x10<host>
loop txqueuelen 1000 (Local Loopback)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
However, if I need to restart sycri/kubelet then I need to make sure to edit that file. The workaround is to create a copy of the 10-calico.conflist
in the /etc/cni/net.d
called 00-my-calico.conflist
that adds the loopback. That seems to make the fix stick, however, any updates pushed from the ICP/k8s master would be lost.
The Docker k8s worker has the same original 10-calico.conflist
file. I'm wondering why I need to explicitly specify the loopback plugin when using sycri and not with Docker. I don't know enough about the CNI protocol to say for sure if this is a sycri or ICP issue. I wonder if it is an implicit requirement (maybe just for Calico) that the lo
device be mounted into every pod. From the calico conf examples I'm turning up (looking here ) they also don't have the loopback device listed.
@jjhursey Good question about making loopback the default plugin and always run it. I will discuss it with @cclerget when he is back, but it looks like we don't want to be too smart here and prefer to follow CNI configuration as it is. If I understand correctly, you can include loopback into calico conflist (see installation guide). Not sure where dynamic conflist update comes from. Are you using calico controllers?
The calico-kube-controllers
deployment is running, and loopback
plugin is installed in /opt/cni/bin/
. I can search around to see if there is a way to update the master copy of the file that is being pushed out.
It is this sentence in the Calico config documentation that makes me think that it should be a default, but it is unclear:
In addition to the CNI plugin specified by the CNI config file, Kubernetes requires the standard CNI loopback plugin.
Let me know what you find out.
I haven't tried any other CRI mechanisms to see how they react to this config file (e.g., CRI-O, containerd), just the docker(shim) CRI. Having the additional loopback plugin listed does not seem to have a negative impact on the Docker worker node.
What are the steps to reproduce this issue?
ifconfig
installed.kubectl create -f ./tiny.yaml
(see tiny.yaml )kubectl exec PODNAME ifconfig
to view the interfaces inside the container.What happens?
I have two worker nodes: 1 running Docker and 1 running Singularity. I have noticed that the Singularity worker starts containers without the
lo
loopback network device.With Docker I will see two interfaces:
With the Singularity CRI I only see
eth0
:What were you expecting to happen?
I expect to see both the
eth0
andlo
network devices inside the running container.Any logs, error output, comments, etc?
I was monitoring the systemd journal (
journalctl -f
) whilesycri
was creating the image and did not notice any additional error messages of note.This is an IBM Cloud Private 3.1.2 environment running Kubernetes v1.12.4 with Calico.
Environment?
OS distribution and version: RHEL 7.6 (ppc64le)
go version:
1.11.5
go env:
Singularity-CRI version:
v1.0.0-beta.5
Singularity version:
3.2.1-1.el7
Kubernetes version:
v1.12.4+icp-ee