Closed reedy closed 7 years ago
kube-dns-279829092-snx1m 0/3 rpc error: code = 2 desc = failed to start container "d36f9c532ae6344a10edc9432ef501467f7579eccf696c404b522ec64215dd14": Error response from daemon: {"message":"cannot join network of a non running container: 5339c3b368457231e740e865c94f327c8c7f14d0cbdd87279c8a3ed51633921b"} 0 1m
kube-dns seems to not work every time...
I'm gonna reflash the SD cards, try again and see if 0.7.0 flannel makes any improvements
So copying the admin.conf was missing, fixed in the above commit too.
Using 0.7.0 of flannel doesn't make any difference at least. But it's generally good not to just use arbitary commits from master
Doesn't seem to matter if you do the kubectl create
for flannel or the kubeadm join
next.
But it seems things still actually error out :(
root@rpi-node01:/home/pirate# kubectl get po --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system etcd-rpi-node01 1/1 Running 0 9m
kube-system kube-apiserver-rpi-node01 1/1 Running 0 9m
kube-system kube-controller-manager-rpi-node01 1/1 Running 0 9m
kube-system kube-dns-279829092-z8566 0/3 rpc error: code = 2 desc = failed to start container "2cf37daa7f2101381ba374d5dd4eb82fece4065909c10c76ed51e1593b8cc118": Error response from daemon: {"message":"cannot join network of a non running container: 305874017ebcb6209a4b381bba937756f415236c0e5dee60172a56f812e52e46"} 6 17m
kube-system kube-flannel-ds-3l49f 1/2 CrashLoopBackOff 7 14m
kube-system kube-flannel-ds-grql3 1/2 CrashLoopBackOff 7 14m
kube-system kube-flannel-ds-qb6h3 1/2 CrashLoopBackOff 7 14m
kube-system kube-flannel-ds-qhns1 1/2 CrashLoopBackOff 7 14m
kube-system kube-proxy-9sscg 1/1 Running 0 14m
kube-system kube-proxy-f7fks 1/1 Running 0 14m
kube-system kube-proxy-g04f5 1/1 Running 0 14m
kube-system kube-proxy-gllzw 1/1 Running 0 17m
kube-system kube-proxy-mnpdp 1/1 Running 0 14m
kube-system kube-scheduler-rpi-node01 1/1 Running 0 9m
Plus node01 isn't ready...
root@rpi-node01:/home/pirate# kubectl get nodes
NAME STATUS AGE VERSION
rpi-node01 NotReady 50m v1.6.1
rpi-node02 Ready 46m v1.6.1
rpi-node03 Ready 46m v1.6.1
rpi-node04 Ready 46m v1.6.1
rpi-node05 Ready 46m v1.6.1
@reedy Thanks you for your inspections, appreciate that you filed this issue and your PRs! Unfortunately, the two PRs don't seem to fix the issue.
Please note that I won't have a further look at this problem and won't test it in the near future. Sorry that I can't provide support from my side! Though I am happy to include your PRs that improve the situation.
Also during the blog posts, I encountered race conditions myself. Sad that they are still not fixed. Maybe @luxas can help out in this issue.
@MathiasRenner this isn't related to that (the race conditions we're fixed in flannel v0.7.0)
@reedy You should follow the new for kubeadm v1.6 commands:
sudo cp /etc/kubernetes/admin.conf $HOME/
sudo chown $(id -u):$(id -g) $HOME/admin.conf
export KUBECONFIG=$HOME/admin.conf
Also flannel needs RBAC rules, please apply them as well. When we wrote the article, we wrote it for kubeadm v1.5. kubeadm v1.6 has some new stuff
We maybe should point that out retroactively @MathiasRenner
@luxas sure on the new v1.6 commands. Hence adding #53 which highlights them for other people so they need to add them, because I noticed that this guide didn't include them, but the 'installer' was spitting them out to be run.
Which has been merged now. Thanks for both merges @MathiasRenner
@luxas Where are these RBAC rules? Are you meaning https://github.com/coreos/flannel/blob/master/Documentation/kube-flannel-rbac.yml ? Is this as well as, or instead of kube-flannel.yml ? I note the -rbac version is newer than v0.7.0, so should be in a newer tagged release whenever that actually happens
I did find https://github.com/kubernetes/kubeadm/issues/143 suggesting other people are fixing up where necessary in the projects...
Thanks!
Ran into the same problem.
Master
System Info:
Machine ID: 9989a26f06984d6dbadc01770f018e3b
System UUID: 9989a26f06984d6dbadc01770f018e3b
Boot ID: 987ae214-1656-4330-9c76-8ca5adef76cb
Kernel Version: 4.4.50-hypriotos-v7+
OS Image: Raspbian GNU/Linux 8 (jessie)
Operating System: linux
Architecture: arm
Container Runtime Version: docker://Unknown
Kubelet Version: v1.6.4
Kube-Proxy Version: v1.6.4
PodCIDR: 10.244.0.0/24
Node
System Info:
Machine ID: 9989a26f06984d6dbadc01770f018e3b
System UUID: 9989a26f06984d6dbadc01770f018e3b
Boot ID: 744745ea-c896-4b42-94be-2994fbaa5410
Kernel Version: 4.4.50-hypriotos-v7+
OS Image: Raspbian GNU/Linux 8 (jessie)
Operating System: linux
Architecture: arm
Container Runtime Version: docker://Unknown
Kubelet Version: v1.6.4
Kube-Proxy Version: v1.6.4
PodCIDR: 10.244.1.0/24
Flannel worked for me:
curl -sSL https://rawgit.com/coreos/flannel/master/Documentation/kube-flannel-rbac.yml | kubectl create -f -
curl -sSL https://rawgit.com/coreos/flannel/master/Documentation/kube-flannel.yml | sed "s/amd64/arm/g" | kubectl create -f -
| sed "s/amd64/arm/g"
for ARM on Raspberry Pi
alternatively Weave:
kubectl apply -f https://git.io/weave-kube-1.6
RBAC was the problem introduced with 1.6
So we do just need to add the rbac one first into the tutorial?
Yes, rbac is enabled by default in version 1.6. http://blog.kubernetes.io/2017/04/rbac-support-in-kubernetes.html
So this has to be added:
curl -sSL https://rawgit.com/coreos/flannel/master/Documentation/kube-flannel-rbac.yml | kubectl create -f -
Providers of network plugins are called upon bundling rbac rules with their existing configuration. https://github.com/kubernetes/kubeadm/issues/143 However, flannel does not include rbac. Therefore the additional rbac configuration. They may provide a bundled version soon.
I have tested it with the latest master
, like above and it worked fine. In contrast to the illustration of the blog post, the state of a node now changes to ready after successfully adding the network plugin which starts the pod for each node https://github.com/kubernetes/kubernetes/issues/43815.
The blog states version 0.7.0 for flannel. i did not explicitely test this version, nevertheless it should work.
If using Kubernetes 1.6+ (with RBAC) I found I also had to use apply the Traefik RBAC instructions listed here https://github.com/containous/traefik/tree/master/examples/k8s/traefik-rbac.yaml (described at https://docs.traefik.io/user-guide/kubernetes/)
I also had to tweak those once more by adding one line to include permission to get secrets
I then had to update https://raw.githubusercontent.com/hypriot/rpi-traefik/master/traefik-k8s-example.yaml with lines to create the serviceAccount traefik-ingress-controller
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: traefik-ingress-controller
namespace: kube-system
---
kind: Deployment
apiVersion: extensions/v1beta1
metadata:
name: traefik-ingress-controller
namespace: kube-system
labels:
k8s-app: traefik-ingress-controller
spec:
replicas: 1
selector:
matchLabels:
k8s-app: traefik-ingress-controller
template:
metadata:
labels:
k8s-app: traefik-ingress-controller
annotations:
scheduler.alpha.kubernetes.io/tolerations: |
[
{
"key": "dedicated",
"operator": "Equal",
"value": "master",
"effect": "NoSchedule"
}
]
spec:
serviceAccountName: traefik-ingress-controller
terminationGracePeriodSeconds: 60
hostNetwork: true
nodeSelector:
nginx-controller: "traefik"
containers:
- image: hypriot/rpi-traefik
name: traefik-ingress-controller
resources:
limits:
cpu: 200m
...etc
So I'm going through https://blog.hypriot.com/post/setup-kubernetes-raspberry-pi-cluster/
After running
kubeadm init --pod-network-cidr 10.244.0.0/16
and thenkubeadm join --token
on the nodes... I go back to the master and runkubectl get nodes
, I getScrolling back up, I notice
Did some steps get missed out from your tutorial? Or did it get out of order?
If I run the copies...
But all my nodes are NotReady now... Follow the guide a bit further...
I guess doing some more commands forced it to do stuff?
Nodes stay ready... But the kube-flannel error all over the place and seemingly give up?
I also noticed
Using master is probably a bit temperamental...
I suppose, with the date of https://blog.hypriot.com/post/setup-kubernetes-raspberry-pi-cluster/ being Jan 11, and https://github.com/coreos/flannel/releases/tag/v0.7.0 being Jan 10... I guess you were using that. Rather than the various commits that are on master now https://github.com/coreos/flannel/compare/v0.7.0...master
So, I think there's a versioning issue with flannel... I'll do a PR to swap it from master. And see about where to put the
kubectl get nodes
copy fixes