hypriot / blog

Repository for the Hypriot Blog
https://blog.hypriot.com
87 stars 37 forks source link

"Setup Kubernetes on a Raspberry Pi Cluster easily the official way!" -- missing steps? #51

Closed reedy closed 7 years ago

reedy commented 7 years ago

So I'm going through https://blog.hypriot.com/post/setup-kubernetes-raspberry-pi-cluster/

After running kubeadm init --pod-network-cidr 10.244.0.0/16 and then kubeadm join --token on the nodes... I go back to the master and run kubectl get nodes, I get

root@rpi-node01:/home/pirate# kubectl get nodes
The connection to the server localhost:8080 was refused - did you specify the right host or port?

Scrolling back up, I notice

To start using your cluster, you need to run (as a regular user):

  sudo cp /etc/kubernetes/admin.conf $HOME/
  sudo chown $(id -u):$(id -g) $HOME/admin.conf
  export KUBECONFIG=$HOME/admin.conf

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
  http://kubernetes.io/docs/admin/addons/

Did some steps get missed out from your tutorial? Or did it get out of order?

If I run the copies...

root@rpi-node01:/home/pirate#   sudo cp /etc/kubernetes/admin.conf $HOME/
root@rpi-node01:/home/pirate#   sudo chown $(id -u):$(id -g) $HOME/admin.conf
root@rpi-node01:/home/pirate#   export KUBECONFIG=$HOME/admin.conf
root@rpi-node01:/home/pirate# kubectl get nodes
NAME         STATUS     AGE       VERSION
rpi-node01   NotReady   19m       v1.6.1
rpi-node02   NotReady   18m       v1.6.1
rpi-node03   NotReady   18m       v1.6.1
rpi-node04   NotReady   18m       v1.6.1
rpi-node05   NotReady   18m       v1.6.1

But all my nodes are NotReady now... Follow the guide a bit further...

root@rpi-node01:/home/pirate# curl -sSL https://rawgit.com/coreos/flannel/master/Documentation/kube-flannel.yml | sed "s/amd64/arm/g" | kubectl create -f -
serviceaccount "flannel" created
configmap "kube-flannel-cfg" created
daemonset "kube-flannel-ds" created
root@rpi-node01:/home/pirate# kubectl get nodes
NAME         STATUS     AGE       VERSION
rpi-node01   NotReady   21m       v1.6.1
rpi-node02   NotReady   20m       v1.6.1
rpi-node03   NotReady   20m       v1.6.1
rpi-node04   NotReady   20m       v1.6.1
rpi-node05   NotReady   20m       v1.6.1
root@rpi-node01:/home/pirate# kubectl get po --all-namespaces
NAMESPACE     NAME                                 READY     STATUS              RESTARTS   AGE
kube-system   etcd-rpi-node01                      1/1       Running             0          20m
kube-system   kube-apiserver-rpi-node01            1/1       Running             0          20m
kube-system   kube-controller-manager-rpi-node01   1/1       Running             0          20m
kube-system   kube-dns-279829092-l9sn0             0/3       Pending             0          21m
kube-system   kube-flannel-ds-fl1r8                0/2       ContainerCreating   0          28s
kube-system   kube-flannel-ds-j2859                0/2       ContainerCreating   0          28s
kube-system   kube-flannel-ds-k05w2                0/2       ContainerCreating   0          28s
kube-system   kube-flannel-ds-mc7xx                0/2       ContainerCreating   0          28s
kube-system   kube-flannel-ds-rztch                0/2       ContainerCreating   0          28s
kube-system   kube-proxy-0qszs                     1/1       Running             0          20m
kube-system   kube-proxy-24tv2                     1/1       Running             0          20m
kube-system   kube-proxy-dmkqs                     1/1       Running             0          21m
kube-system   kube-proxy-nqj28                     1/1       Running             0          20m
kube-system   kube-proxy-wxcb0                     1/1       Running             0          20m
kube-system   kube-scheduler-rpi-node01            1/1       Running             0          21m
root@rpi-node01:/home/pirate# kubectl get po --all-namespaces
NAMESPACE     NAME                                 READY     STATUS              RESTARTS   AGE
kube-system   etcd-rpi-node01                      1/1       Running             0          20m
kube-system   kube-apiserver-rpi-node01            1/1       Running             0          20m
kube-system   kube-controller-manager-rpi-node01   1/1       Running             0          20m
kube-system   kube-dns-279829092-l9sn0             0/3       Pending             0          21m
kube-system   kube-flannel-ds-fl1r8                2/2       Running             1          43s
kube-system   kube-flannel-ds-j2859                0/2       ContainerCreating   0          43s
kube-system   kube-flannel-ds-k05w2                2/2       Running             0          43s
kube-system   kube-flannel-ds-mc7xx                0/2       ContainerCreating   0          43s
kube-system   kube-flannel-ds-rztch                1/2       CrashLoopBackOff    1          43s
kube-system   kube-proxy-0qszs                     1/1       Running             0          20m
kube-system   kube-proxy-24tv2                     1/1       Running             0          21m
kube-system   kube-proxy-dmkqs                     1/1       Running             0          21m
kube-system   kube-proxy-nqj28                     1/1       Running             0          20m
kube-system   kube-proxy-wxcb0                     1/1       Running             0          21m
kube-system   kube-scheduler-rpi-node01            1/1       Running             0          21m
root@rpi-node01:/home/pirate# kubectl get po --all-namespaces
NAMESPACE     NAME                                 READY     STATUS             RESTARTS   AGE
kube-system   etcd-rpi-node01                      1/1       Running            0          20m
kube-system   kube-apiserver-rpi-node01            1/1       Running            0          21m
kube-system   kube-controller-manager-rpi-node01   1/1       Running            0          20m
kube-system   kube-dns-279829092-l9sn0             0/3       Pending            0          21m
kube-system   kube-flannel-ds-fl1r8                1/2       CrashLoopBackOff   1          53s
kube-system   kube-flannel-ds-j2859                1/2       CrashLoopBackOff   1          53s
kube-system   kube-flannel-ds-k05w2                1/2       CrashLoopBackOff   1          53s
kube-system   kube-flannel-ds-mc7xx                1/2       CrashLoopBackOff   1          53s
kube-system   kube-flannel-ds-rztch                1/2       CrashLoopBackOff   1          53s
kube-system   kube-proxy-0qszs                     1/1       Running            0          20m
kube-system   kube-proxy-24tv2                     1/1       Running            0          21m
kube-system   kube-proxy-dmkqs                     1/1       Running            0          21m
kube-system   kube-proxy-nqj28                     1/1       Running            0          20m
kube-system   kube-proxy-wxcb0                     1/1       Running            0          21m
kube-system   kube-scheduler-rpi-node01            1/1       Running            0          21m
root@rpi-node01:/home/pirate# kubectl get nodes
NAME         STATUS    AGE       VERSION
rpi-node01   Ready     21m       v1.6.1
rpi-node02   Ready     21m       v1.6.1
rpi-node03   Ready     21m       v1.6.1
rpi-node04   Ready     21m       v1.6.1
rpi-node05   Ready     20m       v1.6.1
root@rpi-node01:/home/pirate# 

I guess doing some more commands forced it to do stuff?

Nodes stay ready... But the kube-flannel error all over the place and seemingly give up?

 root@rpi-node01:/home/pirate# kubectl get po --all-namespaces
NAMESPACE     NAME                                 READY     STATUS                                                                                                                                                                                                                                                                                RESTARTS   AGE
kube-system   etcd-rpi-node01                      1/1       Running                                                                                                                                                                                                                                                                               0          26m
kube-system   kube-apiserver-rpi-node01            1/1       Running                                                                                                                                                                                                                                                                               0          27m
kube-system   kube-controller-manager-rpi-node01   1/1       Running                                                                                                                                                                                                                                                                               0          26m
kube-system   kube-dns-279829092-l9sn0             0/3       rpc error: code = 2 desc = failed to start container "74f5e4fa84e81b2d58247fb84fae563e21b0979045fb1b6cd935ba8088088c04": Error response from daemon: {"message":"cannot join network of a non running container: 2fc0830fbbc0ba2a455cc7dd7d7d72374bcebfa726940919e8716eed2c13337a"}   0          27m
kube-system   kube-flannel-ds-fl1r8                1/2       CrashLoopBackOff                                                                                                                                                                                                                                                                      6          6m
kube-system   kube-flannel-ds-j2859                1/2       Error                                                                                                                                                                                                                                                                                 6          6m
kube-system   kube-flannel-ds-k05w2                1/2       Error                                                                                                                                                                                                                                                                                 6          6m
kube-system   kube-flannel-ds-mc7xx                1/2       CrashLoopBackOff                                                                                                                                                                                                                                                                      6          6m
kube-system   kube-flannel-ds-rztch                1/2       CrashLoopBackOff                                                                                                                                                                                                                                                                      6          6m
kube-system   kube-proxy-0qszs                     1/1       Running                                                                                                                                                                                                                                                                               0          26m
kube-system   kube-proxy-24tv2                     1/1       Running                                                                                                                                                                                                                                                                               0          27m
kube-system   kube-proxy-dmkqs                     1/1       Running                                                                                                                                                                                                                                                                               0          27m
kube-system   kube-proxy-nqj28                     1/1       Running                                                                                                                                                                                                                                                                               0          26m
kube-system   kube-proxy-wxcb0                     1/1       Running                                                                                                                                                                                                                                                                               0          27m
kube-system   kube-scheduler-rpi-node01            1/1       Running                                                                                                                                                                                                                                                                               0          27m
root@rpi-node01:/home/pirate# 

I also noticed

curl -sSL https://rawgit.com/coreos/flannel/master/Documentation/kube-flannel.yml | sed "s/amd64/arm/g" | kubectl create -f -

Using master is probably a bit temperamental...

curl -sSL https://rawgit.com/coreos/flannel/v0.7.0/Documentation/kube-flannel.yml | sed "s/amd64/arm/g" | kubectl create -f -

I suppose, with the date of https://blog.hypriot.com/post/setup-kubernetes-raspberry-pi-cluster/ being Jan 11, and https://github.com/coreos/flannel/releases/tag/v0.7.0 being Jan 10... I guess you were using that. Rather than the various commits that are on master now https://github.com/coreos/flannel/compare/v0.7.0...master


So, I think there's a versioning issue with flannel... I'll do a PR to swap it from master. And see about where to put the kubectl get nodes copy fixes

reedy commented 7 years ago
kube-dns-279829092-snx1m             0/3       rpc error: code = 2 desc = failed to start container "d36f9c532ae6344a10edc9432ef501467f7579eccf696c404b522ec64215dd14": Error response from daemon: {"message":"cannot join network of a non running container: 5339c3b368457231e740e865c94f327c8c7f14d0cbdd87279c8a3ed51633921b"}   0          1m

kube-dns seems to not work every time...

I'm gonna reflash the SD cards, try again and see if 0.7.0 flannel makes any improvements

reedy commented 7 years ago

So copying the admin.conf was missing, fixed in the above commit too.

Using 0.7.0 of flannel doesn't make any difference at least. But it's generally good not to just use arbitary commits from master

Doesn't seem to matter if you do the kubectl create for flannel or the kubeadm join next.

reedy commented 7 years ago

But it seems things still actually error out :(

root@rpi-node01:/home/pirate# kubectl get po --all-namespaces
NAMESPACE     NAME                                 READY     STATUS                                                                                                                                                                                                                                                                                RESTARTS   AGE
kube-system   etcd-rpi-node01                      1/1       Running                                                                                                                                                                                                                                                                               0          9m
kube-system   kube-apiserver-rpi-node01            1/1       Running                                                                                                                                                                                                                                                                               0          9m
kube-system   kube-controller-manager-rpi-node01   1/1       Running                                                                                                                                                                                                                                                                               0          9m
kube-system   kube-dns-279829092-z8566             0/3       rpc error: code = 2 desc = failed to start container "2cf37daa7f2101381ba374d5dd4eb82fece4065909c10c76ed51e1593b8cc118": Error response from daemon: {"message":"cannot join network of a non running container: 305874017ebcb6209a4b381bba937756f415236c0e5dee60172a56f812e52e46"}   6          17m
kube-system   kube-flannel-ds-3l49f                1/2       CrashLoopBackOff                                                                                                                                                                                                                                                                      7          14m
kube-system   kube-flannel-ds-grql3                1/2       CrashLoopBackOff                                                                                                                                                                                                                                                                      7          14m
kube-system   kube-flannel-ds-qb6h3                1/2       CrashLoopBackOff                                                                                                                                                                                                                                                                      7          14m
kube-system   kube-flannel-ds-qhns1                1/2       CrashLoopBackOff                                                                                                                                                                                                                                                                      7          14m
kube-system   kube-proxy-9sscg                     1/1       Running                                                                                                                                                                                                                                                                               0          14m
kube-system   kube-proxy-f7fks                     1/1       Running                                                                                                                                                                                                                                                                               0          14m
kube-system   kube-proxy-g04f5                     1/1       Running                                                                                                                                                                                                                                                                               0          14m
kube-system   kube-proxy-gllzw                     1/1       Running                                                                                                                                                                                                                                                                               0          17m
kube-system   kube-proxy-mnpdp                     1/1       Running                                                                                                                                                                                                                                                                               0          14m
kube-system   kube-scheduler-rpi-node01            1/1       Running                                                                                                                                                                                                                                                                               0          9m
reedy commented 7 years ago

Plus node01 isn't ready...

root@rpi-node01:/home/pirate# kubectl get nodes
NAME         STATUS     AGE       VERSION
rpi-node01   NotReady   50m       v1.6.1
rpi-node02   Ready      46m       v1.6.1
rpi-node03   Ready      46m       v1.6.1
rpi-node04   Ready      46m       v1.6.1
rpi-node05   Ready      46m       v1.6.1
MathiasRenner commented 7 years ago

@reedy Thanks you for your inspections, appreciate that you filed this issue and your PRs! Unfortunately, the two PRs don't seem to fix the issue.

Please note that I won't have a further look at this problem and won't test it in the near future. Sorry that I can't provide support from my side! Though I am happy to include your PRs that improve the situation.

Also during the blog posts, I encountered race conditions myself. Sad that they are still not fixed. Maybe @luxas can help out in this issue.

luxas commented 7 years ago

@MathiasRenner this isn't related to that (the race conditions we're fixed in flannel v0.7.0)

@reedy You should follow the new for kubeadm v1.6 commands:

  sudo cp /etc/kubernetes/admin.conf $HOME/
  sudo chown $(id -u):$(id -g) $HOME/admin.conf
  export KUBECONFIG=$HOME/admin.conf

Also flannel needs RBAC rules, please apply them as well. When we wrote the article, we wrote it for kubeadm v1.5. kubeadm v1.6 has some new stuff

We maybe should point that out retroactively @MathiasRenner

reedy commented 7 years ago

@luxas sure on the new v1.6 commands. Hence adding #53 which highlights them for other people so they need to add them, because I noticed that this guide didn't include them, but the 'installer' was spitting them out to be run.

Which has been merged now. Thanks for both merges @MathiasRenner

@luxas Where are these RBAC rules? Are you meaning https://github.com/coreos/flannel/blob/master/Documentation/kube-flannel-rbac.yml ? Is this as well as, or instead of kube-flannel.yml ? I note the -rbac version is newer than v0.7.0, so should be in a newer tagged release whenever that actually happens

I did find https://github.com/kubernetes/kubeadm/issues/143 suggesting other people are fixing up where necessary in the projects...

Thanks!

cgebe commented 7 years ago

Ran into the same problem.

Master

System Info:
 Machine ID:            9989a26f06984d6dbadc01770f018e3b
 System UUID:           9989a26f06984d6dbadc01770f018e3b
 Boot ID:           987ae214-1656-4330-9c76-8ca5adef76cb
 Kernel Version:        4.4.50-hypriotos-v7+
 OS Image:          Raspbian GNU/Linux 8 (jessie)
 Operating System:      linux
 Architecture:          arm
 Container Runtime Version: docker://Unknown
 Kubelet Version:       v1.6.4
 Kube-Proxy Version:        v1.6.4
PodCIDR:            10.244.0.0/24

Node

System Info:
 Machine ID:            9989a26f06984d6dbadc01770f018e3b
 System UUID:           9989a26f06984d6dbadc01770f018e3b
 Boot ID:           744745ea-c896-4b42-94be-2994fbaa5410
 Kernel Version:        4.4.50-hypriotos-v7+
 OS Image:          Raspbian GNU/Linux 8 (jessie)
 Operating System:      linux
 Architecture:          arm
 Container Runtime Version: docker://Unknown
 Kubelet Version:       v1.6.4
 Kube-Proxy Version:        v1.6.4
PodCIDR:            10.244.1.0/24

Flannel worked for me:

curl -sSL https://rawgit.com/coreos/flannel/master/Documentation/kube-flannel-rbac.yml |  kubectl create -f -
curl -sSL https://rawgit.com/coreos/flannel/master/Documentation/kube-flannel.yml | sed "s/amd64/arm/g" | kubectl create -f -

| sed "s/amd64/arm/g" for ARM on Raspberry Pi

alternatively Weave:

kubectl apply -f https://git.io/weave-kube-1.6

RBAC was the problem introduced with 1.6

reedy commented 7 years ago

So we do just need to add the rbac one first into the tutorial?

cgebe commented 7 years ago

Yes, rbac is enabled by default in version 1.6. http://blog.kubernetes.io/2017/04/rbac-support-in-kubernetes.html

So this has to be added:

curl -sSL https://rawgit.com/coreos/flannel/master/Documentation/kube-flannel-rbac.yml |  kubectl create -f -

Providers of network plugins are called upon bundling rbac rules with their existing configuration. https://github.com/kubernetes/kubeadm/issues/143 However, flannel does not include rbac. Therefore the additional rbac configuration. They may provide a bundled version soon.

I have tested it with the latest master, like above and it worked fine. In contrast to the illustration of the blog post, the state of a node now changes to ready after successfully adding the network plugin which starts the pod for each node https://github.com/kubernetes/kubernetes/issues/43815.

The blog states version 0.7.0 for flannel. i did not explicitely test this version, nevertheless it should work.

oliw commented 7 years ago

If using Kubernetes 1.6+ (with RBAC) I found I also had to use apply the Traefik RBAC instructions listed here https://github.com/containous/traefik/tree/master/examples/k8s/traefik-rbac.yaml (described at https://docs.traefik.io/user-guide/kubernetes/)

I also had to tweak those once more by adding one line to include permission to get secrets

I then had to update https://raw.githubusercontent.com/hypriot/rpi-traefik/master/traefik-k8s-example.yaml with lines to create the serviceAccount traefik-ingress-controller

---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: traefik-ingress-controller
  namespace: kube-system
---
kind: Deployment
apiVersion: extensions/v1beta1
metadata:
  name: traefik-ingress-controller
  namespace: kube-system
  labels:
    k8s-app: traefik-ingress-controller
spec:
  replicas: 1
  selector:
    matchLabels:
      k8s-app: traefik-ingress-controller
  template:
    metadata:
      labels:
        k8s-app: traefik-ingress-controller
      annotations:
        scheduler.alpha.kubernetes.io/tolerations: |
          [
            {
              "key": "dedicated",
              "operator": "Equal",
              "value": "master",
              "effect": "NoSchedule"
            }
          ]
    spec:
      serviceAccountName: traefik-ingress-controller
      terminationGracePeriodSeconds: 60
      hostNetwork: true
      nodeSelector:
        nginx-controller: "traefik"
      containers:
      - image: hypriot/rpi-traefik
        name: traefik-ingress-controller
        resources:
          limits:
            cpu: 200m
...etc