Closed Smana closed 8 years ago
Hi @Smana - I believe you're hitting this issue: https://github.com/projectcalico/calico-kubernetes/issues/79
This log tipped me off:
Nov 12 12:01:58 kbnode1 kubelet[17740]: E1112 12:01:58.715512 17740 manager.go:377] NetworkPlugin calico failed on the status hook for pod 'kubedns-louck' - invalid character '-' after top-level value
This should be fixed as of calico-kubernetes version v0.4.0
- you can upgrade the plugin with this command:
calicoctl node --ip=<IP> --kubernetes --kube-plugin-version=v0.5.0
Hi Casey and thank you for your help.
I've tested with the option --kube-plugin-version
and actually it fixed the ip assignement issue but now the networking doesn't work on any of my pods.
Let's take the pod named "busybox" :
| kbnode1 | docker | c7bdae2f537ad10260f47fda623f42900172d98dfec941536a2e5ad55734795d | 465ccb5e898b11e586a65254008dc07e | 10.233.64.4/32 | 9a:67:e9:c6:76:b5 | default_busybox-2_c7bdae2f537a | active |
kubectl exec busybox-2 -- ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
107: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast qlen 1000
link/ether 9a:67:e9:c6:76:b5 brd ff:ff:ff:ff:ff:ff
inet 10.233.64.4/32 scope global eth0
valid_lft forever preferred_lft forever
inet6 fe80::9867:e9ff:fec6:76b5/64 scope link
valid_lft forever preferred_lft forever
kubectl describe po busybox-2 | grep ^IP
IP: 10.233.64.4
As you can see the ip address is properly defined and known by kubernetes. But when i check the routing table on the given node i can't see its route
default via 10.115.255.253 dev eth0
10.0.0.0/8 via 10.115.255.254 dev eth0
10.115.0.0/16 dev eth0 proto kernel scope link src 10.115.99.11
blackhole 10.233.64.0/26 proto bird
10.233.64.64/26 via 10.115.99.12 dev eth0 proto bird
172.16.42.0/24 dev cbr0 proto kernel scope link src 172.16.42.2
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.42.1
I think it is related to the bridge configuration but i didn't figured it out
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 10000
link/ether 52:54:00:8d:c0:7e brd ff:ff:ff:ff:ff:ff
inet 10.115.99.11/16 brd 10.115.255.255 scope global eth0
valid_lft forever preferred_lft forever
inet6 fe80::5054:ff:fe8d:c07e/64 scope link
valid_lft forever preferred_lft forever
3: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default
link/ether 02:42:9f:21:69:9d brd ff:ff:ff:ff:ff:ff
inet 172.17.42.1/16 scope global docker0
valid_lft forever preferred_lft forever
4: cbr0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default
link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff
inet 172.16.42.2/24 scope global cbr0
valid_lft forever preferred_lft forever
inet6 fe80::8464:dbff:fec6:ac32/64 scope link
valid_lft forever preferred_lft forever
96: cali23fa3ace898: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
...
The docker daemon is started on cbr0 bridge
ps aux | grep cbr0
root 19302 4.0 1.5 1464500 31416 ? Ssl 22:17 0:33 /usr/bin/docker daemon -H fd:// --bridge=cbr0 --iptables=false --ip-masq=false
I don't know if it is relevant but i have the following logs :
Nov 12 22:40:35 kbnode1 kubelet[18146]: 2015-11-12 22:40:35,001 31421 [edf054db8115] INFO Plugin Args: ['/usr/libexec/kubernetes/kubelet-plugins/net/exec/calico/calico', 'setup', 'default', 'busybox', 'edf054db8115610147bbbf7ca7e2d6c21ffe1fb8c8f047332491f3c623403a2a']
Nov 12 22:40:35 kbnode1 kubelet[18146]: 2015-11-12 22:40:35,002 31421 [edf054db8115] INFO Using LOG_LEVEL=INFO
Nov 12 22:40:35 kbnode1 kubelet[18146]: 2015-11-12 22:40:35,002 31421 [edf054db8115] INFO Using ETCD_AUTHORITY=10.115.99.10:4001
Nov 12 22:40:35 kbnode1 kubelet[18146]: 2015-11-12 22:40:35,002 31421 [edf054db8115] INFO Using KUBE_API_ROOT=http://10.115.99.10:8080/api/v1/
Nov 12 22:40:35 kbnode1 kubelet[18146]: 2015-11-12 22:40:35,002 31421 [edf054db8115] INFO Using CALICO_IPAM=true
Nov 12 22:40:35 kbnode1 kubelet[18146]: 2015-11-12 22:40:35,003 31421 [edf054db8115] INFO Using DEFAULT_POLICY=allow
Nov 12 22:40:35 kbnode1 kubelet[18146]: 2015-11-12 22:40:35,003 31421 [edf054db8115] INFO Executing Calico pod-creation hook
Nov 12 22:40:35 kbnode1 kubelet[18146]: 2015-11-12 22:40:35,009 31421 [edf054db8115] INFO Configuring docker container edf054db8115610147bbbf7ca7e2d6c21ffe1fb8c8f047332491f3c623403a2a
Nov 12 22:40:35 kbnode1 kubelet[18146]: 2015-11-12 22:40:35,024 31421 [edf054db8115] INFO Deleting docker interface eth0
Nov 12 22:40:35 kbnode1 kubelet[18146]: 2015-11-12 22:40:35,028 31421 [edf054db8115] INFO Container edf054db8115610147bbbf7ca7e2d6c21ffe1fb8c8f047332491f3c623403a2a running with PID 31413
Nov 12 22:40:35 kbnode1 kubelet[18146]: 2015-11-12 22:40:35,028 31421 [edf054db8115] INFO Linking to container's netns
Nov 12 22:40:35 kbnode1 kubelet[18146]: 2015-11-12 22:40:35,128 31421 [edf054db8115] INFO Configuring Calico network interface
Nov 12 22:40:35 kbnode1 kubelet[18146]: 2015-11-12 22:40:35,135 31421 [edf054db8115] INFO Using Calico IPAM
Nov 12 22:40:35 kbnode1 kubelet[18146]: Traceback (most recent call last):
Nov 12 22:40:35 kbnode1 kubelet[18146]: File "/code/build/calico_kubernetes/out00-PYZ.pyz/logging", line 850, in emit
Nov 12 22:40:35 kbnode1 kubelet[18146]: File "/code/build/calico_kubernetes/out00-PYZ.pyz/logging", line 723, in format
Nov 12 22:40:35 kbnode1 kubelet[18146]: File "/code/build/calico_kubernetes/out00-PYZ.pyz/logging", line 467, in format
Nov 12 22:40:35 kbnode1 kubelet[18146]: KeyError: 'identity'
Nov 12 22:40:35 kbnode1 kubelet[18146]: Logged from file pycalico.ipam, line 403
Nov 12 22:40:35 kbnode1 kubelet[18146]: Traceback (most recent call last):
Nov 12 22:40:35 kbnode1 kubelet[18146]: File "/code/build/calico_kubernetes/out00-PYZ.pyz/logging", line 850, in emit
Nov 12 22:40:35 kbnode1 kubelet[18146]: File "/code/build/calico_kubernetes/out00-PYZ.pyz/logging", line 723, in format
Nov 12 22:40:35 kbnode1 kubelet[18146]: File "/code/build/calico_kubernetes/out00-PYZ.pyz/logging", line 467, in format
Nov 12 22:40:35 kbnode1 kubelet[18146]: KeyError: 'identity'
Nov 12 22:40:35 kbnode1 kubelet[18146]: Logged from file pycalico.ipam, line 403
Nov 12 22:40:35 kbnode1 kubelet[18146]: Traceback (most recent call last):
Nov 12 22:40:35 kbnode1 kubelet[18146]: File "/code/build/calico_kubernetes/out00-PYZ.pyz/logging", line 850, in emit
Nov 12 22:40:35 kbnode1 kubelet[18146]: File "/code/build/calico_kubernetes/out00-PYZ.pyz/logging", line 723, in format
Nov 12 22:40:35 kbnode1 kubelet[18146]: File "/code/build/calico_kubernetes/out00-PYZ.pyz/logging", line 467, in format
Nov 12 22:40:35 kbnode1 kubelet[18146]: KeyError: 'identity'
Nov 12 22:40:35 kbnode1 kubelet[18146]: Logged from file pycalico.ipam, line 407
Nov 12 22:40:35 kbnode1 kubelet[18146]: Traceback (most recent call last):
Nov 12 22:40:35 kbnode1 kubelet[18146]: File "/code/build/calico_kubernetes/out00-PYZ.pyz/logging", line 850, in emit
Nov 12 22:40:35 kbnode1 kubelet[18146]: File "/code/build/calico_kubernetes/out00-PYZ.pyz/logging", line 723, in format
Nov 12 22:40:35 kbnode1 kubelet[18146]: File "/code/build/calico_kubernetes/out00-PYZ.pyz/logging", line 467, in format
Nov 12 22:40:35 kbnode1 kubelet[18146]: KeyError: 'identity'
Nov 12 22:40:35 kbnode1 kubelet[18146]: Logged from file pycalico.ipam, line 407
Nov 12 22:40:35 kbnode1 kubelet[18146]: Traceback (most recent call last):
Nov 12 22:40:35 kbnode1 kubelet[18146]: File "/code/build/calico_kubernetes/out00-PYZ.pyz/logging", line 850, in emit
Nov 12 22:40:35 kbnode1 kubelet[18146]: File "/code/build/calico_kubernetes/out00-PYZ.pyz/logging", line 723, in format
Nov 12 22:40:35 kbnode1 kubelet[18146]: File "/code/build/calico_kubernetes/out00-PYZ.pyz/logging", line 467, in format
Nov 12 22:40:35 kbnode1 kubelet[18146]: KeyError: 'identity'
Nov 12 22:40:35 kbnode1 kubelet[18146]: Logged from file pycalico.ipam, line 411
Nov 12 22:40:35 kbnode1 kubelet[18146]: Traceback (most recent call last):
Nov 12 22:40:35 kbnode1 kubelet[18146]: File "/code/build/calico_kubernetes/out00-PYZ.pyz/logging", line 850, in emit
Nov 12 22:40:35 kbnode1 kubelet[18146]: File "/code/build/calico_kubernetes/out00-PYZ.pyz/logging", line 723, in format
Nov 12 22:40:35 kbnode1 kubelet[18146]: File "/code/build/calico_kubernetes/out00-PYZ.pyz/logging", line 467, in format
Nov 12 22:40:35 kbnode1 kubelet[18146]: KeyError: 'identity'
Nov 12 22:40:35 kbnode1 kubelet[18146]: Logged from file pycalico.ipam, line 411
Nov 12 22:40:35 kbnode1 kubelet[18146]: 2015-11-12 22:40:35,207 31421 [edf054db8115] INFO Creating endpoint with IPs [IPAddress('10.233.64.4')]
Nov 12 22:40:35 kbnode1 kubelet[18146]: 2015-11-12 22:40:35,211 31421 [edf054db8115] INFO Creating the veth with namespace pid 31413 on interface name eth0
Nov 12 22:40:35 kbnode1 kubelet[18146]: 2015-11-12 22:40:35,630 31421 [edf054db8115] INFO Setting mac address 46:de:cd:6a:51:8d to endpoint cali637bbce2898
Nov 12 22:40:35 kbnode1 kubelet[18146]: 2015-11-12 22:40:35,642 31421 [edf054db8115] INFO Using IP Address 10.115.99.11
Nov 12 22:40:35 kbnode1 kubelet[18146]: 2015-11-12 22:40:35,642 31421 [edf054db8115] INFO Adding IP 10.115.99.11 to interface cali637bbce2898
Nov 12 22:40:35 kbnode1 kubelet[18146]: 2015-11-12 22:40:35,647 31421 [edf054db8115] INFO Finished configuring network interface
Nov 12 22:40:35 kbnode1 kubelet[18146]: 2015-11-12 22:40:35,648 31421 [edf054db8115] INFO Getting API Resource: pods from KUBE_API_ROOT: http://10.115.99.10:8080/api/v1/
Nov 12 22:40:35 kbnode1 kubelet[18146]: 2015-11-12 22:40:35,676 31421 [edf054db8115] INFO Configuring Pod Profile: default_busybox_edf054db8115
Nov 12 22:40:35 kbnode1 kubelet[18146]: 2015-11-12 22:40:35,682 31421 [edf054db8115] WARNING No annotations found in pod {u'status': {u'phase': u'Pending', u'startTime': u'2015-11-12T22:40:33Z'}, u'spec': {u'dnsPolicy': u'ClusterFirst', u'serviceAccountName': u'default', u'serviceAccount': u'default', u'terminationGracePeriodSeconds': 30, u'restartPolicy': u'Always', u'volumes': [{u'secret': {u'secretName': u'default-token-sei98'}, u'name': u'default-token-sei98'}], u'containers': [{u'terminationMessagePath': u'/dev/termination-log', u'name': u'busybox', u'image': u'busybox', u'volumeMounts': [{u'readOnly': True, u'mountPath': u'/var/run/secrets/kubernetes.io/serviceaccount', u'name': u'default-token-sei98'}], u'command': [u'sleep', u'3600'], u'imagePullPolicy': u'IfNotPresent', u'resources': {}}], u'nodeName': u'10.115.99.11'}, u'metadata': {u'name': u'busybox', u'namespace': u'default', u'resourceVersion': u'51798', u'creationTimestamp': u'2015-11-12T22:40:33Z', u'selfLink': u'/api/v1/namespaces/default/pods/busybox', u'uid': u'629f036e-898e-11e5-bd72-525400a0dafe'}}
Nov 12 22:40:35 kbnode1 kubelet[18146]: 2015-11-12 22:40:35,730 31421 [edf054db8115] INFO Applying tags
Nov 12 22:40:35 kbnode1 kubelet[18146]: 2015-11-12 22:40:35,737 31421 [edf054db8115] INFO Adding tag namespace_default
Nov 12 22:40:35 kbnode1 kubelet[18146]: 2015-11-12 22:40:35,737 31421 [edf054db8115] WARNING No labels found in pod {u'status': {u'phase': u'Pending', u'startTime': u'2015-11-12T22:40:33Z'}, u'spec': {u'dnsPolicy': u'ClusterFirst', u'serviceAccountName': u'default', u'serviceAccount': u'default', u'terminationGracePeriodSeconds': 30, u'restartPolicy': u'Always', u'volumes': [{u'secret': {u'secretName': u'default-token-sei98'}, u'name': u'default-token-sei98'}], u'containers': [{u'terminationMessagePath': u'/dev/termination-log', u'name': u'busybox', u'image': u'busybox', u'volumeMounts': [{u'readOnly': True, u'mountPath': u'/var/run/secrets/kubernetes.io/serviceaccount', u'name': u'default-token-sei98'}], u'command': [u'sleep', u'3600'], u'imagePullPolicy': u'IfNotPresent', u'resources': {}}], u'nodeName': u'10.115.99.11'}, u'metadata': {u'name': u'busybox', u'namespace': u'default', u'resourceVersion': u'51798', u'creationTimestamp': u'2015-11-12T22:40:33Z', u'selfLink': u'/api/v1/namespaces/default/pods/busybox', u'uid': u'629f036e-898e-11e5-bd72-525400a0dafe'}}
Nov 12 22:40:35 kbnode1 kubelet[18146]: 2015-11-12 22:40:35,738 31421 [edf054db8115] WARNING No labels found in pod {u'status': {u'phase': u'Pending', u'startTime': u'2015-11-12T22:40:33Z'}, u'spec': {u'dnsPolicy': u'ClusterFirst', u'serviceAccountName': u'default', u'serviceAccount': u'default', u'terminationGracePeriodSeconds': 30, u'restartPolicy': u'Always', u'volumes': [{u'secret': {u'secretName': u'default-token-sei98'}, u'name': u'default-token-sei98'}], u'containers': [{u'terminationMessagePath': u'/dev/termination-log', u'name': u'busybox', u'image': u'busybox', u'volumeMounts': [{u'readOnly': True, u'mountPath': u'/var/run/secrets/kubernetes.io/serviceaccount', u'name': u'default-token-sei98'}], u'command': [u'sleep', u'3600'], u'imagePullPolicy': u'IfNotPresent', u'resources': {}}], u'nodeName': u'10.115.99.11'}, u'metadata': {u'name': u'busybox', u'namespace': u'default', u'resourceVersion': u'51798', u'creationTimestamp': u'2015-11-12T22:40:33Z', u'selfLink': u'/api/v1/namespaces/default/pods/busybox', u'uid': u'629f036e-898e-11e5-bd72-525400a0dafe'}}
Nov 12 22:40:35 kbnode1 kubelet[18146]: 2015-11-12 22:40:35,757 31421 [edf054db8115] INFO Finished applying tags.
Nov 12 22:40:35 kbnode1 kubelet[18146]: 2015-11-12 22:40:35,758 31421 [edf054db8115] INFO Setting profile default_busybox_edf054db8115 on endpoint 637bbce2898e11e58f005254008dc07e
Nov 12 22:40:35 kbnode1 kubelet[18146]: 2015-11-12 22:40:35,782 31421 [edf054db8115] INFO Finished configuring profile.
Nov 12 22:40:35 kbnode1 kubelet[18146]: 2015-11-12 22:40:35,783 31421 [edf054db8115] INFO Calico network plugin execution complete
Nov 12 22:40:35 kbnode1 kubelet[18146]: , <nil>
Nov 12 22:40:35 kbnode1 kubelet[18146]: I1112 22:40:35.844708 18146 hairpin.go:97] Enabling hairpin on interface cali637bbce2898
Nov 12 22:40:35 kbnode1 kubelet[18146]: W1112 22:40:35.844871 18146 manager.go:1875] Hairpin setup failed for pod "busybox_default": open /sys/devices/virtual/net/cali637bbce2898/brport/hairpin_mode: no such file or directory
Nov 12 22:40:36 kbnode1 kubelet[18146]: I1112 22:40:36.213766 18146 exec.go:149] Status 'exec' network plugin output: {"ip": "10.233.64.4", "kind": "PodNetworkStatus", "apiVersion": "v1beta1"}
Nov 12 22:40:36 kbnode1 kubelet[18146]: , <nil>
Your help is welcomed again :) Thanks
@Smana - Interesting.
I wouldn't expect the bridge config to come into play here, since Calico isn't actually using the cbr0
or docker0
bridges, and the IP subnets assigned to them don't overlap with the Calico subnet.
It does look like you are missing the local route for that container - I'd expect a route that looks something like this -
10.233.64.4/32 dev cali465ccb5e
That route should be installed by the Calico agent felix
- could you check the felix logs found in /var/log/calico/felix/
and paste the output here? I'm interested to see if there are any errors in that log.
It might also be useful to see the Kubernetes plugin logs - these can be found in the /var/log/calico/kubernetes/
directory.
I've the following error on /var/log/calico/felix
2015-11-12 22:17:03,776 [ERROR][5037/19] calico.felix.actor 320: _finish_msg_batch failed.
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/calico/felix/actor.py", line 307, in _step
self._finish_msg_batch(batch, results)
File "/usr/lib/python2.7/site-packages/calico/felix/endpoint.py", line 329, in _finish_msg_batch
self._configure_interface()
File "/usr/lib/python2.7/site-packages/calico/felix/endpoint.py", line 526, in _configure_interface
if not devices.interface_exists(self._iface_name):
File "/usr/lib/python2.7/site-packages/calico/felix/devices.py", line 91, in interface_exists
futils.check_call(["ip", "link", "list", interface])
File "/usr/lib/python2.7/site-packages/calico/felix/futils.py", line 340, in check_call
args, retcode, stdout, stderr, input=input_str)
FailedSystemCall: Failed system call (retcode : 1, args : ('ip', 'link', 'list', u'calieab3d8f4893'))
stdout :
stderr : ip: can't find device 'calieab3d8f4893'
input : None
TrackedAsyncResult ('<disabled>', '<disabled>', 'LocalEndpoint(eab3d8f4893711e5b49e5254008dc07e(IPv4))', 'on_endpoint_update') was leaked with exception FailedSystemCall('ip', 'link', 'list', u'calieab3d8f4893'). Dying.
there are a huge number of this kind of errors in the logs.
Nothing relevant in the /var/log/calico/kubernetes
2015-11-13 08:35:17,026 13294 [68fb31373f44] INFO Begin Calico network plugin execution
2015-11-13 08:35:17,026 13294 [68fb31373f44] INFO Plugin Args: ['/usr/libexec/kubernetes/kubelet-plugins/net/exec/calico/calico', 'setup', 'default', 'busybox-2', '68fb31373f445d894dc0dc05097950403660421b5622781b729a9714fb246b75']
2015-11-13 08:35:17,026 13294 [68fb31373f44] INFO Using LOG_LEVEL=INFO
2015-11-13 08:35:17,027 13294 [68fb31373f44] INFO Using ETCD_AUTHORITY=10.115.99.10:4001
2015-11-13 08:35:17,027 13294 [68fb31373f44] INFO Using KUBE_API_ROOT=http://10.115.99.10:8080/api/v1/
2015-11-13 08:35:17,027 13294 [68fb31373f44] INFO Using CALICO_IPAM=true
2015-11-13 08:35:17,027 13294 [68fb31373f44] INFO Using DEFAULT_POLICY=allow
2015-11-13 08:35:17,028 13294 [68fb31373f44] INFO Executing Calico pod-creation hook
2015-11-13 08:35:17,032 13294 [68fb31373f44] INFO Configuring docker container 68fb31373f445d894dc0dc05097950403660421b5622781b729a9714fb246b75
2015-11-13 08:35:17,048 13294 [68fb31373f44] INFO Deleting docker interface eth0
2015-11-13 08:35:17,053 13294 [68fb31373f44] INFO Container 68fb31373f445d894dc0dc05097950403660421b5622781b729a9714fb246b75 running with PID 13232
2015-11-13 08:35:17,054 13294 [68fb31373f44] INFO Linking to container's netns
2015-11-13 08:35:17,177 13294 [68fb31373f44] INFO Configuring Calico network interface
2015-11-13 08:35:17,185 13294 [68fb31373f44] INFO Using Calico IPAM
2015-11-13 08:35:17,367 13294 [68fb31373f44] INFO Creating endpoint with IPs [IPAddress('10.233.64.6')]
2015-11-13 08:35:17,372 13294 [68fb31373f44] INFO Creating the veth with namespace pid 13232 on interface name eth0
2015-11-13 08:35:17,773 13294 [68fb31373f44] INFO Setting mac address 76:01:32:81:ee:7f to endpoint cali77b557ce89e
2015-11-13 08:35:17,783 13294 [68fb31373f44] INFO Using IP Address 10.115.99.11
2015-11-13 08:35:17,783 13294 [68fb31373f44] INFO Adding IP 10.115.99.11 to interface cali77b557ce89e
2015-11-13 08:35:17,787 13294 [68fb31373f44] INFO Finished configuring network interface
2015-11-13 08:35:17,788 13294 [68fb31373f44] INFO Getting API Resource: pods from KUBE_API_ROOT: http://10.115.99.10:8080/api/v1/
2015-11-13 08:35:17,818 13294 [68fb31373f44] INFO Configuring Pod Profile: default_busybox-2_68fb31373f44
2015-11-13 08:35:17,825 13294 [68fb31373f44] WARNING No annotations found in pod {u'status': {u'phase': u'Pending', u'startTime': u'2015-11-13T08:35:15Z'}, u'spec': {u'dnsPolicy': u'ClusterFirst', u'serviceAccountName': u'default', u'serviceAccount': u'default', u'terminationGracePeriodSeconds': 30, u'restartPolicy': u'Always', u'volumes': [{u'secret': {u'secretName': u'default-token-sei98'}, u'name': u'default-token-sei98'}], u'containers': [{u'terminationMessagePath': u'/dev/termination-log', u'name': u'busybox', u'image': u'busybox', u'volumeMounts': [{u'readOnly': True, u'mountPath': u'/var/run/secrets/kubernetes.io/serviceaccount', u'name': u'default-token-sei98'}], u'command': [u'sleep', u'3600'], u'imagePullPolicy': u'IfNotPresent', u'resources': {}}], u'nodeName': u'10.115.99.11'}, u'metadata': {u'name': u'busybox-2', u'namespace': u'default', u'resourceVersion': u'63476', u'creationTimestamp': u'2015-11-13T08:35:15Z', u'selfLink': u'/api/v1/namespaces/default/pods/busybox-2', u'uid': u'769ebc93-89e1-11e5-bd72-525400a0dafe'}}
2015-11-13 08:35:17,913 13294 [68fb31373f44] INFO Applying tags
2015-11-13 08:35:17,920 13294 [68fb31373f44] INFO Adding tag namespace_default
2015-11-13 08:35:17,920 13294 [68fb31373f44] WARNING No labels found in pod {u'status': {u'phase': u'Pending', u'startTime': u'2015-11-13T08:35:15Z'}, u'spec': {u'dnsPolicy': u'ClusterFirst', u'serviceAccountName': u'default', u'serviceAccount': u'default', u'terminationGracePeriodSeconds': 30, u'restartPolicy': u'Always', u'volumes': [{u'secret': {u'secretName': u'default-token-sei98'}, u'name': u'default-token-sei98'}], u'containers': [{u'terminationMessagePath': u'/dev/termination-log', u'name': u'busybox', u'image': u'busybox', u'volumeMounts': [{u'readOnly': True, u'mountPath': u'/var/run/secrets/kubernetes.io/serviceaccount', u'name': u'default-token-sei98'}], u'command': [u'sleep', u'3600'], u'imagePullPolicy': u'IfNotPresent', u'resources': {}}], u'nodeName': u'10.115.99.11'}, u'metadata': {u'name': u'busybox-2', u'namespace': u'default', u'resourceVersion': u'63476', u'creationTimestamp': u'2015-11-13T08:35:15Z', u'selfLink': u'/api/v1/namespaces/default/pods/busybox-2', u'uid': u'769ebc93-89e1-11e5-bd72-525400a0dafe'}}
2015-11-13 08:35:17,921 13294 [68fb31373f44] WARNING No labels found in pod {u'status': {u'phase': u'Pending', u'startTime': u'2015-11-13T08:35:15Z'}, u'spec': {u'dnsPolicy': u'ClusterFirst', u'serviceAccountName': u'default', u'serviceAccount': u'default', u'terminationGracePeriodSeconds': 30, u'restartPolicy': u'Always', u'volumes': [{u'secret': {u'secretName': u'default-token-sei98'}, u'name': u'default-token-sei98'}], u'containers': [{u'terminationMessagePath': u'/dev/termination-log', u'name': u'busybox', u'image': u'busybox', u'volumeMounts': [{u'readOnly': True, u'mountPath': u'/var/run/secrets/kubernetes.io/serviceaccount', u'name': u'default-token-sei98'}], u'command': [u'sleep', u'3600'], u'imagePullPolicy': u'IfNotPresent', u'resources': {}}], u'nodeName': u'10.115.99.11'}, u'metadata': {u'name': u'busybox-2', u'namespace': u'default', u'resourceVersion': u'63476', u'creationTimestamp': u'2015-11-13T08:35:15Z', u'selfLink': u'/api/v1/namespaces/default/pods/busybox-2', u'uid': u'769ebc93-89e1-11e5-bd72-525400a0dafe'}}
2015-11-13 08:35:17,955 13294 [68fb31373f44] INFO Finished applying tags.
2015-11-13 08:35:17,956 13294 [68fb31373f44] INFO Setting profile default_busybox-2_68fb31373f44 on endpoint 77b557ce89e111e5a8e15254008dc07e
2015-11-13 08:35:17,981 13294 [68fb31373f44] INFO Finished configuring profile.
2015-11-13 08:35:17,982 13294 [68fb31373f44] INFO Calico network plugin execution complete
In the meantime, i've installed a new cluster from scratch and it works better. My pods can ping each others but i can't reach the nodes subnet.
For instance my dns container needs to reach the master ip in order to forward dns request to my master
kubectl logs kubedns-tfcje -c skydns | tail -n 2
2015/11/13 10:52:57 skydns: failure to forward request "read udp 10.115.99.1:53: i/o timeout"
kubectl exec busybox -- ping -c 1 10.115.99.1
PING 10.115.99.1 (10.115.99.1): 56 data bytes
--- 10.115.99.1 ping statistics ---
1 packets transmitted, 0 packets received, 100% packet loss
error: error executing remote command: Error executing command in container: Error executing in Docker Container: 1
Is there something missing which allows to reach the nodes ips please ?
Anyway, it is interesting to solve the issue regarding the above felix errors
@Smana
I think the skydns log is benign (unless you have a DNS service running at 10.115.99.1).
Let's see if we can figure out why the ping isn't working. Are you running on bare-meal? Or are you in a cloud-deployment (GCE, AWS, etc)?
If you're in a cloud deployment, make sure that you've enabled ipip
as discussed here: https://github.com/projectcalico/calico-docker/blob/master/docs/FAQ.md#can-i-run-calico-in-a-public-cloud-environment
You'll likely want to perform this step as well: https://github.com/projectcalico/calico-docker/blob/master/docs/kubernetes/AWSIntegration.md#node-connectivity-workaround
If you're running on bare-metal, or you've performed the above steps and connectivity is still not working, I'd would be helpful to see the full IP routing table on both your minion and your master. Could you paste that here?
Actually the dns service is provided by a dnsmasq daemon located on the master server. My cluster is running on bare metal servers and here is the complete route tables.
Master
default via 10.115.255.253 dev eth0
10.0.0.0/8 via 10.115.255.254 dev eth0
10.115.0.0/16 dev eth0 proto kernel scope link src 10.115.99.1
172.16.42.0/24 dev cbr0 proto kernel scope link src 172.16.42.1
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.42.1
192.168.0.0/17 via 10.115.255.254 dev eth0
Node
default via 10.115.255.253 dev eth0
10.0.0.0/8 via 10.115.255.254 dev eth0
10.115.0.0/16 dev eth0 proto kernel scope link src 10.115.99.2
10.233.64.0 dev caliad614bcc89e scope link
blackhole 10.233.64.0/26 proto bird
10.233.64.1 dev cali57af05cc89e scope link
10.233.64.64/26 via 10.115.99.3 dev eth0 proto bird
172.16.42.0/24 dev cbr0 proto kernel scope link src 172.16.42.2
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.42.1
192.168.0.0/17 via 10.115.255.254 dev eth0
Pod
default via 10.115.99.2 dev eth0
10.115.99.2 dev eth0
As i said, pods can ping each other.
kbmaster ~ 0 # kubectl exec busybox -- ping -c 1 10.233.64.64
PING 10.233.64.64 (10.233.64.64): 56 data bytes
64 bytes from 10.233.64.64: seq=0 ttl=62 time=0.689 ms
--- 10.233.64.64 ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max = 0.689/0.689/0.689 ms
I really don't understand why i can't reach the network 10.115.0.0/16 network from a pod. Thank you again for your help.
It looks to me like the master doesn't have any routes to the pod IP pool (10.233.0.0/16), so your traffic may be making it to the master, but cannot route back (unless the gateway 10.115.255.253 is pod-aware).
Usually I'd expect these routes to be learned over BGP - how have you configured your BGP topology? Are you running the calico/node
container on your master?
My mistake, that was obvious. I removed the calico node on the master therefore there's no route back. Thank you Casey and sorry :p
Well, now everything works as expected when i deploy a cluster from scratch. Unfortunately i'm still facing my reboot issue. When i restart the server, the pods are started but networking is not working properly.
After the reboot the pods has an ip address
docker exec 3b2f767566a5 ip a sh dev eth0
19: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast qlen 1000
link/ether aa:1f:8b:bd:98:39 brd ff:ff:ff:ff:ff:ff
inet 10.233.64.72/32 scope global eth0
valid_lft forever preferred_lft forever
inet6 fe80::a81f:8bff:febd:9839/64 scope link
valid_lft forever preferred_lft forever
docker exec 3b2f767566a5 ip r
default via 10.115.99.11 dev eth0
10.115.99.11 dev eth0
But on the host the route to the veth interface is not created. Below is the route table of the node 1
default via 10.115.255.253 dev eth0
10.0.0.0/8 via 10.115.255.254 dev eth0
10.115.0.0/16 dev eth0 proto kernel scope link src 10.115.99.11
10.233.64.0/26 via 10.115.99.12 dev eth0 proto bird
blackhole 10.233.64.64/26 proto bird
172.16.42.0/24 dev cbr0 proto kernel scope link src 172.16.42.2
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.42.1
192.168.0.0/17 via 10.115.255.254 dev eth0
The felix logs show the following logs
...
TrackedAsyncResult ('<disabled>', '<disabled>', 'LocalEndpoint(f29baba48d0c11e592645254006f763d(IPv6))', 'on_endpoint_update') was leaked with exception FailedSystemCall('ip', 'link', 'list', u'calif29baba48d0'). Dying.
2015-11-17 09:44:30,462 [WARNING][1801/20] calico.felix.devices 279: Cannot check flags for interface calif29baba48d0 (/sys/class/net/calif29baba48d0/flags) - assume down: IOError(2, 'No such file or directory').
2015-11-17 09:44:30,463 [WARNING][1801/21] calico.felix.devices 279: Cannot check flags for interface calif2b400468d0 (/sys/class/net/calif2b400468d0/flags) - assume down: IOError(2, 'No such file or directory').
2015-11-17 09:44:30,464 [WARNING][1801/22] calico.felix.devices 279: Cannot check flags for interface cali0f641e548d0 (/sys/class/net/cali0f641e548d0/flags) - assume down: IOError(2, 'No such file or directory').
2015-11-17 09:44:30,465 [WARNING][1801/23] calico.felix.devices 279: Cannot check flags for interface cali93cfb4a88d0 (/sys/class/net/cali93cfb4a88d0/flags) - assume down: IOError(2, 'No such file or directory').
2015-11-17 09:44:30,466 [WARNING][1801/24] calico.felix.devices 279: Cannot check flags for interface cali6e7479fc8d0 (/sys/class/net/cali6e7479fc8d0/flags) - assume down: IOError(2, 'No such file or directory').
2015-11-17 09:44:30,467 [WARNING][1801/25] calico.felix.devices 279: Cannot check flags for interface cali6e63feb08d0 (/sys/class/net/cali6e63feb08d0/flags) - assume down: IOError(2, 'No such file or directory').
2015-11-17 09:44:30,468 [WARNING][1801/26] calico.felix.devices 279: Cannot check flags for interface cali0f46d6f08d0 (/sys/class/net/cali0f46d6f08d0/flags) - assume down: IOError(2, 'No such file or directory').
2015-11-17 09:44:30,473 [WARNING][1801/30] calico.felix.devices 279: Cannot check flags for interface calif29baba48d0 (/sys/class/net/calif29baba48d0/flags) - assume down: IOError(2, 'No such file or directory').
2015-11-17 09:44:30,474 [WARNING][1801/31] calico.felix.devices 279: Cannot check flags for interface calif2b400468d0 (/sys/class/net/calif2b400468d0/flags) - assume down: IOError(2, 'No such file or directory').
2015-11-17 09:44:30,475 [WARNING][1801/32] calico.felix.devices 279: Cannot check flags for interface cali0f641e548d0 (/sys/class/net/cali0f641e548d0/flags) - assume down: IOError(2, 'No such file or directory').
2015-11-17 09:44:30,476 [WARNING][1801/33] calico.felix.devices 279: Cannot check flags for interface cali93cfb4a88d0 (/sys/class/net/cali93cfb4a88d0/flags) - assume down: IOError(2, 'No such file or directory').
2015-11-17 09:44:30,477 [WARNING][1801/34] calico.felix.devices 279: Cannot check flags for interface cali6e7479fc8d0 (/sys/class/net/cali6e7479fc8d0/flags) - assume down: IOError(2, 'No such file or directory').
2015-11-17 09:44:30,478 [WARNING][1801/35] calico.felix.devices 279: Cannot check flags for interface cali6e63feb08d0 (/sys/class/net/cali6e63feb08d0/flags) - assume down: IOError(2, 'No such file or directory').
2015-11-17 09:44:30,479 [WARNING][1801/36] calico.felix.devices 279: Cannot check flags for interface cali0f46d6f08d0 (/sys/class/net/cali0f46d6f08d0/flags) - assume down: IOError(2, 'No such file or directory').
2015-11-17 09:44:30,581 [ERROR][1801/30] calico.felix.actor 320: _finish_msg_batch failed.
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/calico/felix/actor.py", line 307, in _step
self._finish_msg_batch(batch, results)
File "/usr/lib/python2.7/site-packages/calico/felix/endpoint.py", line 329, in _finish_msg_batch
self._configure_interface()
File "/usr/lib/python2.7/site-packages/calico/felix/endpoint.py", line 526, in _configure_interface
if not devices.interface_exists(self._iface_name):
File "/usr/lib/python2.7/site-packages/calico/felix/devices.py", line 91, in interface_exists
futils.check_call(["ip", "link", "list", interface])
File "/usr/lib/python2.7/site-packages/calico/felix/futils.py", line 340, in check_call
args, retcode, stdout, stderr, input=input_str)
FailedSystemCall: Failed system call (retcode : 1, args : ('ip', 'link', 'list', u'calif29baba48d0'))
stdout :
stderr : ip: can't find device 'calif29baba48d0'
input : None
TrackedAsyncResult ('<disabled>', '<disabled>', 'LocalEndpoint(f29baba48d0c11e592645254006f763d(IPv6))', 'on_endpoint_update') was leaked with exception FailedSystemCall('ip', 'link', 'list', u'calif29baba48d0'). Dying.
2015-11-17 09:44:32,129 [WARNING][1846/20] calico.felix.devices 279: Cannot check flags for interface calif29baba48d0 (/sys/class/net/calif29baba48d0/flags) - assume down: IOError(2, 'No such file or directory').
2015-11-17 09:44:32,130 [WARNING][1846/21] calico.felix.devices 279: Cannot check flags for interface calif2b400468d0 (/sys/class/net/calif2b400468d0/flags) - assume down: IOError(2, 'No such file or directory').
2015-11-17 09:44:32,132 [WARNING][1846/22] calico.felix.devices 279: Cannot check flags for interface cali0f641e548d0 (/sys/class/net/cali0f641e548d0/flags) - assume down: IOError(2, 'No such file or directory').
2015-11-17 09:44:32,134 [WARNING][1846/23] calico.felix.devices 279: Cannot check flags for interface cali93cfb4a88d0 (/sys/class/net/cali93cfb4a88d0/flags) - assume down: IOError(2, 'No such file or directory').
2015-11-17 09:44:32,135 [WARNING][1846/24] calico.felix.devices 279: Cannot check flags for interface cali6e7479fc8d0 (/sys/class/net/cali6e7479fc8d0/flags) - assume down: IOError(2, 'No such file or directory').
2015-11-17 09:44:32,137 [WARNING][1846/25] calico.felix.devices 279: Cannot check flags for interface cali6e63feb08d0 (/sys/class/net/cali6e63feb08d0/flags) - assume down: IOError(2, 'No such file or directory').
2015-11-17 09:44:32,138 [WARNING][1846/26] calico.felix.devices 279: Cannot check flags for interface cali0f46d6f08d0 (/sys/class/net/cali0f46d6f08d0/flags) - assume down: IOError(2, 'No such file or directory').
2015-11-17 09:44:32,143 [WARNING][1846/30] calico.felix.devices 279: Cannot check flags for interface calif29baba48d0 (/sys/class/net/calif29baba48d0/flags) - assume down: IOError(2, 'No such file or directory').
2015-11-17 09:44:32,148 [WARNING][1846/31] calico.felix.devices 279: Cannot check flags for interface calif2b400468d0 (/sys/class/net/calif2b400468d0/flags) - assume down: IOError(2, 'No such file or directory').
2015-11-17 09:44:32,149 [WARNING][1846/32] calico.felix.devices 279: Cannot check flags for interface cali0f641e548d0 (/sys/class/net/cali0f641e548d0/flags) - assume down: IOError(2, 'No such file or directory').
2015-11-17 09:44:32,151 [WARNING][1846/33] calico.felix.devices 279: Cannot check flags for interface cali93cfb4a88d0 (/sys/class/net/cali93cfb4a88d0/flags) - assume down: IOError(2, 'No such file or directory').
2015-11-17 09:44:32,152 [WARNING][1846/34] calico.felix.devices 279: Cannot check flags for interface cali6e7479fc8d0 (/sys/class/net/cali6e7479fc8d0/flags) - assume down: IOError(2, 'No such file or directory').
2015-11-17 09:44:32,154 [WARNING][1846/35] calico.felix.devices 279: Cannot check flags for interface cali6e63feb08d0 (/sys/class/net/cali6e63feb08d0/flags) - assume down: IOError(2, 'No such file or directory').
2015-11-17 09:44:32,155 [WARNING][1846/36] calico.felix.devices 279: Cannot check flags for interface cali0f46d6f08d0 (/sys/class/net/cali0f46d6f08d0/flags) - assume down: IOError(2, 'No such file or directory').
2015-11-17 09:44:32,318 [ERROR][1846/20] calico.felix.actor 320: _finish_msg_batch failed.
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/calico/felix/actor.py", line 307, in _step
self._finish_msg_batch(batch, results)
File "/usr/lib/python2.7/site-packages/calico/felix/endpoint.py", line 329, in _finish_msg_batch
self._configure_interface()
File "/usr/lib/python2.7/site-packages/calico/felix/endpoint.py", line 526, in _configure_interface
if not devices.interface_exists(self._iface_name):
File "/usr/lib/python2.7/site-packages/calico/felix/devices.py", line 91, in interface_exists
futils.check_call(["ip", "link", "list", interface])
File "/usr/lib/python2.7/site-packages/calico/felix/futils.py", line 340, in check_call
args, retcode, stdout, stderr, input=input_str)
FailedSystemCall: Failed system call (retcode : 1, args : ('ip', 'link', 'list', u'calif29baba48d0'))
stdout :
stderr : ip: can't find device 'calif29baba48d0'
input : None
TrackedAsyncResult ('<disabled>', '<disabled>', 'LocalEndpoint(f29baba48d0c11e592645254006f763d(IPv4))', 'on_endpoint_update') was leaked with exception FailedSystemCall('ip', 'link', 'list', u'calif29baba48d0'). Dying.
...
And then i can't create pods anymore on this node because the cali interface is not created even for the new created pods :(
I've generated a diag file if needed : https://transfer.sh/UvGBx/diags-171115-095757.tar.gz
@Smana - Thanks for the diags, I'm going to dig into this today.
I made sure to test this case when v1.1 was released and saw it work, so this might be a regression, or it might just be a subtle difference in the way the clusters are configured.
Hopefully we can get this sorted out once and for all :)
From a log read, it appears that we're at least hitting this issue (though it may not be the cause of the reboot problem): https://github.com/projectcalico/libcalico/issues/47
An upgrade to the latest libcalico should do it. I've done that in #101 so it will go into the next release.
@Smana - Alright, here is what I think.
We're hitting this exception in Felix:
FailedSystemCall: Failed system call (retcode : 1, args : ('ip', 'link', 'list', u'calif29baba48d0'))
stdout :
stderr : ip: can't find device 'calif29baba48d0'
This is expected, since we currently leak endpoints on reboot due to this issue: https://github.com/kubernetes/kubernetes/issues/14940
I'd expect Felix to be resilient to a missing interface, but instead it looks like it is crashing and restarting, and thus preventing the other (valid) endpoints from being programmed. It looks like Felix tries to handle this, but doesn't - I've raised this issue against the Felix codebase: https://github.com/projectcalico/calico/issues/899.
I think the reason this worked when I tried it is that I was only trying on a small number of pods, which meant that it was a lot more likely Felix would process all the good endpoints first (rather than the stale ones). Once I tried with a larger number of pods, it stopped working.
Thanks for raising this and being so responsive. We'll try to fix this up for the next release.
I've upgraded to calico-docker version 0.11.0 released yesterday which includes libcalico v0.5.0. But i've the same issue (no ip routes after reboot).
On felix logs, i just have the following files
Traceback (most recent call last):
File "/usr/lib/python2.7/runpy.py", line 162, in _run_module_as_main
"__main__", fname, loader, pkg_name)
File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
exec code in run_globals
File "/usr/lib/python2.7/site-packages/calico/etcddriver/__main__.py", line 31, in <module>
from calico.etcddriver import driver
File "/usr/lib/python2.7/site-packages/calico/etcddriver/driver.py", line 50, in <module>
from ijson.backends import yajl2 as ijson
File "/usr/lib/python2.7/site-packages/ijson/backends/yajl2.py", line 14, in <module>
yajl = backends.find_yajl(2)
File "/usr/lib/python2.7/site-packages/ijson/backends/__init__.py", line 19, in find_yajl
raise YAJLImportError('YAJL shared object not found.')
ijson.backends.YAJLImportError: YAJL shared object not found.
I hope that will help
@Smana - Just to keep you up to date, I think we have a fix for the felix issue, but it needs to propagate into a calico-docker release before you can try it out. I will keep you posted.
We're also working on fixing the upstream Kubernetes issue that ultimately causes this situation.
@caseydavenport - Thank you for the update, i'll look forward to the latest calico-docker release.
Hi guys! Do you have an idea when will it be released please ?
@Smana - Sorry for the delay, I've been out for a week and just got back.
Here is the PR that fixes this issue - it hasn't made its way into a Felix release yet, hopefully that will be soon. https://github.com/projectcalico/calico/pull/902
Hey @Smana - Not sure if you saw the calico-docker v0.13.0 release go by.
This includes the Felix changes needed to get node reboots working for you. I tried it out this morning and all seemed to work. Let me know when you get a chance to test it!
https://github.com/projectcalico/calico-docker/releases/tag/v0.13.0
Hello @caseydavenport, Thanks for the update. I did some tests this morning and yes it fixed the issue :)
Just one question : for a given pod, when the node is restarted, it doesn't keep its ip (another one is assigned). Maybe its normal
@Smana - Yes, that is normal. Upon reboot Kubernetes is actually creating a whole new set of pods, so new IP addresses are assigned to them.
Thanks for your help, i closed the issue. The changes are applied to my playbooks : https://github.com/ansibl8s/setup-kubernetes See you soon.
Hello,
Following this issue : #34 I'm trying to upgrade my cluster. As suggested by Casey i've configured my node as follows :
Is there some configuration parameter i've missed ?
kubelet is running with the following command
Thank you