Closed mikedanese closed 8 years ago
From @errordeveloper on October 26, 2016 17:28
cc @luxas
From @luxas on October 26, 2016 17:31
Did you use Hypriot v1.0.1 as stated in docs?
From @nlamirault on October 26, 2016 18:39
@luxas I've got the same issue with Hypriot 1.0.0. Where i can download the 1.0.1 image ? I try to download https://downloads.hypriot.com/hypriotos-rpi-v1.0.1.img.zip, but i've got an error. Could you send the link for documentation which explain how use Kubeadm on Hypriot. I could try a new installation.
From @derailed on October 26, 2016 18:49
Nicolas - try https://github.com/hypriot/image-builder-rpi/releases
On Wed, Oct 26, 2016 at 12:40 PM, Nicolas Lamirault < notifications@github.com> wrote:
@luxas https://github.com/luxas I've got the same issue with Hypriot 1.0.0. Where i can download the 1.0.1 image ? I try to download https://downloads.hypriot.com/ hypriotos-rpi-v1.0.1.img.zip, but i've got an error. Could you send the link for documentation which explain how use Kubeadm on Hypriot. I could try a new installation.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/kubernetes/kubernetes/issues/35643#issuecomment-256439414, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAP3KdNzCgFef85JvCFpd-HuE493uidks5q356sgaJpZM4KhcUp .
From @derailed on October 26, 2016 19:25
Thanks for pointing this out @luxas! This was indeed a pilot error. Did get further so we can close this issue. Tho the master is now coming up, setting up the pod network using flannel per the docs results in an error.
ARCH=arm curl -sSL https://raw.githubusercontent.com/luxas/flannel/update-daemonset/Documentation/kube-flannel.yml | sed "s/amd64/${ARCH}/g" | kubectl create -f -
Yields
Error from server: error when creating "flannel.yml": DaemonSet in version "v1beta1" cannot be handled as a DaemonSet: [pos 1115]: json: expect char '"' but got char 'n'
Think the node selector config in the flannel config is incorrect
nodeSelector: beta.kubernetes.io/arch:
From @derailed on October 26, 2016 19:31
Probably a bug in the template, me think the nodeSelector should be:
nodeSelector: beta.kubernetes.io/arch: arm
From @luxas on October 26, 2016 19:34
Ok, seems like the ARCH=arm thing is working poorly then.
Try just running
curl -sSL https://raw.githubusercontent.com/luxas/flannel/update-daemonset/Documentation/kube-flannel.yml | sed "s/amd64/arm/g" | kubectl create -f -
From @derailed on October 26, 2016 21:26
Thank you all for the prompt support! Finally got this thing up and running. Watching this cluster live on rpi is a beautiful thing. Totally impressed by your work. kubeadm rocks. Tx!!
From @nlamirault on October 27, 2016 11:52
using v1.4.4, process go to the end :
$ sudo kubeadm init --use-kubernetes-version v1.4.4 --api-advertise-addresses=192.168.1.23 --pod-network-cidr=10.244.0.0/16
<master/tokens> generated token: "482d60.042c2504ce81cd32"
<master/pki> created keys and certificates in "/etc/kubernetes/pki"
<util/kubeconfig> created "/etc/kubernetes/kubelet.conf"
<util/kubeconfig> created "/etc/kubernetes/admin.conf"
<master/apiclient> created API client configuration
<master/apiclient> created API client, waiting for the control plane to become ready
<master/apiclient> all control plane components are healthy after 38.573899 seconds
<master/apiclient> waiting for at least one node to register and become ready
<master/apiclient> first node is ready after 1.535058 seconds
<master/discovery> created essential addon: kube-discovery, waiting for it to become ready
<master/discovery> kube-discovery is ready after 814.040634 seconds
<master/addons> created essential addon: kube-proxy
<master/addons> created essential addon: kube-dns
Kubernetes master initialised successfully!
You can now join any number of machines by running the following on each node:
kubeadm join --token 482d60.042c2504ce81cd32 192.168.1.23
But kubectl fails :
$ kubectl get pods --namespace=kube-system
client: etcd cluster is unavailable or misconfigured
The etcd container logs : https://gist.github.com/nlamirault/d84e77e02276d158493f15f249324ba5
From @nlamirault on October 27, 2016 12:14
i try the unstable version of kubeadm with reset command. Then i try another init. Etcd container is up, but i've got logs :
2016-10-27 12:14:14.104739 E | etcdhttp: got unexpected response error (etcdserver: request timed out)
2016-10-27 12:14:23.319372 E | etcdhttp: got unexpected response error (etcdserver: request timed out)
2016-10-27 12:14:34.064923 E | etcdhttp: got unexpected response error (etcdserver: request timed out)
From @derailed on October 27, 2016 15:50
Thanks Lucas for pointing this out. I've totally missed it, my bad...
We can close this issue as this clearly was a pilot error. However I am running into another problem setting up the pod network
The flannel daemonset config does not seem to be valid. Guessing it's missing spec.selector?? but not for sure as the error given is not super useful
So
ARCH=arm curl -sSL https://raw.githubusercontent.com/luxas/flannel/update-daemonset/Documentation/kube-flannel.yml | sed "s/amd64/${ARCH}/g" | kubectl create -f -
Yields
Error from server: error when creating "flannel.yml": DaemonSet in version "v1beta1" cannot be handled as a DaemonSet: [pos 1115]: json: expect char '"' but got char 'n'
On Wed, Oct 26, 2016 at 11:32 AM, Lucas Käldström notifications@github.com wrote:
Did you use Hypriot v1.0.1 as stated in docs?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/kubernetes/kubernetes/issues/35643#issuecomment-256420803, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAP3HbPUvhdC7dKAeV1-g4JXAE3Iy3xks5q346ygaJpZM4KhcUp .
From @nlamirault on October 31, 2016 17:3
OK. I recreate the cluster using a new installation of HypriotOS 1.1.0 and using version unstable of kubeadm.
$ sudo kubeadm init --use-kubernetes-version v1.4.4 --api-advertise-addresses=192.168.1.23 --pod-network-cidr=10.244.0.0/16
$ kubectl create -f https://raw.githubusercontent.com/kodbasen/weave-kube-arm/master/weave-daemonset.yaml
I've got errors like that :
etcd cluster is unavailable or misconfigured
Some informations :
$ kubectl get deployments --all-namespaces
NAMESPACE NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
kube-system kube-discovery 1 1 1 1 1h
kube-system kube-dns 1 1 1 0 1h
HypriotOS/armv7: root@jarvis in ~
$ kubectl get services --all-namespaces
NAMESPACE NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE
default kubernetes 10.96.0.1 <none> 443/TCP 1h
kube-system kube-dns 10.96.0.10 <none> 53/UDP,53/TCP 1h
HypriotOS/armv7: root@jarvis in ~
$ kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system dummy-2501624643-psrv9 1/1 Running 2 1h
kube-system etcd-jarvis 1/1 Running 273 1h
kube-system kube-apiserver-jarvis 1/1 Running 1 1h
kube-system kube-controller-manager-jarvis 1/1 Running 217 1h
kube-system kube-discovery-2202902116-zkjmy 1/1 Running 1 1h
kube-system kube-dns-2334855451-zlgnh 0/3 Completed 25 1h
kube-system kube-proxy-c8rax 1/1 Running 1 1h
kube-system kube-scheduler-jarvis 0/1 Error 225 1h
kube-system weave-net-60pr6 2/2 Running 9 1h
$ kubectl describe pod etcd-jarvis --namespace=kube-system
Error from server: client: etcd cluster is unavailable or misconfigured
$ kubectl describe pod etcd-jarvis --namespace=kube-system
Name: etcd-jarvis
Namespace: kube-system
Node: jarvis/192.168.1.23
Start Time: Mon, 31 Oct 2016 16:49:31 +0000
Labels: component=etcd
tier=control-plane
Status: Running
IP: 192.168.1.23
Controllers: <none>
Containers:
etcd:
Container ID: docker://c31f82fa8da305260c912fd38028eb7699e5fce6ac10137bbcf1fe14a4ae9a40
Image: gcr.io/google_containers/etcd-arm:2.2.5
Image ID: docker://sha256:23e15ba74b830d4e9c1f09ce899864a5dde6636df6058f72b31e9694b8c511a3
Port:
Command:
etcd
--listen-client-urls=http://127.0.0.1:2379
--advertise-client-urls=http://127.0.0.1:2379
--data-dir=/var/etcd/data
Requests:
cpu: 200m
State: Running
Started: Mon, 31 Oct 2016 17:05:15 +0000
Last State: Terminated
Reason: Completed
Exit Code: 0
Started: Mon, 31 Oct 2016 17:03:28 +0000
Finished: Mon, 31 Oct 2016 17:04:30 +0000
Ready: True
Restart Count: 273
Liveness: http-get http://127.0.0.1:2379/health delay=15s timeout=15s period=10s #success=1 #failure=8
Volume Mounts:
/etc/kubernetes/ from pki (ro)
/etc/ssl/certs from certs (rw)
/var/etcd from etcd (rw)
Environment Variables: <none>
Conditions:
Type Status
Initialized True
Ready True
PodScheduled True
Volumes:
certs:
Type: HostPath (bare host directory volume)
Path: /etc/ssl/certs
etcd:
Type: HostPath (bare host directory volume)
Path: /var/lib/etcd
pki:
Type: HostPath (bare host directory volume)
Path: /etc/kubernetes
QoS Class: Burstable
Tolerations: <none>
No events.
From @viroos on November 2, 2016 1:39
Have similar issue.
After adding new node (kubeadm join --token ...) etcd is killed: etcd-black-pearl 0/1 Terminating 0 3s
Then it's restarted but everything goes crazy (all kube-system pods are being killed and recreated).
kubectl get nodes returns only master.
I tried f weave and flannel with the same issue. I use hypriot 1.10 (but i also tried 1.0.1 and had the same or similar issue) Last try was on Kubernetes 1.4.5 but on 1.4.4 and 1.4.3 I had the same problem.
In my case kubectl works (although I had to wait a little after configuring network cni since for minute or two i also had 'etcd cluster is unavailable or misconfigured error').
This is 100% reproducible (either with fresh install or using tear down procedure described at: http://kubernetes.io/docs/getting-started-guides/kubeadm/)
From @viroos on November 2, 2016 22:59
Additional info. In /var/log/syslog:
Nov 2 22:56:29 black-pearl kubelet[1668]: W1102 22:56:29.327317 1668 status_manager.go:450] Failed to update status for pod "_()": Operation cannot be fulfilled on pods "etcd-black-pearl": StorageError: invalid object, Code: 4, Key: /registry/pods/kube-system/etcd-black-pearl, ResourceVersion: 0, AdditionalErrorMsg: Precondition failed: UID in precondition: 0x15c6f130, UID in object meta:
From @nlamirault on November 3, 2016 7:52
I've got same error i think. etcd, scheduler and controller-manager are killed and restarted :
$ sudo kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system dummy-2501624643-psrv9 1/1 Running 2 2d
kube-system etcd-jarvis 1/1 Running 298 2d
kube-system kube-apiserver-jarvis 1/1 Running 1 2d
kube-system kube-controller-manager-jarvis 1/1 Running 390 2d
kube-system kube-discovery-2202902116-zkjmy 1/1 Running 1 2d
kube-system kube-dns-2334855451-zlgnh 3/3 Running 85 2d
kube-system kube-proxy-c8rax 1/1 Running 1 2d
kube-system kube-scheduler-jarvis 1/1 Running 409 2d
kube-system kubernetes-dashboard-3628165297-692al 1/1 Running 0 2d
kube-system weave-net-60pr6 2/2 Running 16 2d
The etcd container logs :
$ docker logs -f 0bc2e8f42947
2016-11-02 18:16:41.581493 I | etcdmain: etcd Version: 2.2.5
2016-11-02 18:16:41.581813 I | etcdmain: Git SHA: bc9ddf2
2016-11-02 18:16:41.581884 I | etcdmain: Go Version: go1.6
2016-11-02 18:16:41.583009 I | etcdmain: Go OS/Arch: linux/arm
2016-11-02 18:16:41.583133 I | etcdmain: setting maximum number of CPUs to 4, total number of available CPUs is 4
2016-11-02 18:16:41.583598 N | etcdmain: the server is already initialized as member before, starting as etcd member...
2016-11-02 18:16:41.593874 I | etcdmain: listening for peers on http://localhost:2380
2016-11-02 18:16:41.595057 I | etcdmain: listening for peers on http://localhost:7001
2016-11-02 18:16:41.596288 I | etcdmain: listening for client requests on http://127.0.0.1:2379
2016-11-02 18:16:42.578106 I | etcdserver: recovered store from snapshot at index 380048
2016-11-02 18:16:42.578246 I | etcdserver: name = default
2016-11-02 18:16:42.578316 I | etcdserver: data dir = /var/etcd/data
2016-11-02 18:16:42.579009 I | etcdserver: member dir = /var/etcd/data/member
2016-11-02 18:16:42.579141 I | etcdserver: heartbeat = 100ms
2016-11-02 18:16:42.579230 I | etcdserver: election = 1000ms
2016-11-02 18:16:42.579419 I | etcdserver: snapshot count = 10000
2016-11-02 18:16:42.579862 I | etcdserver: advertise client URLs = http://127.0.0.1:2379
2016-11-02 18:16:42.580132 I | etcdserver: loaded cluster information from store: <nil>
2016-11-02 18:16:46.351154 I | etcdserver: restarting member ce2a822cea30bfca in cluster 7e27652122e8b2ae at commit index 389115
2016-11-02 18:16:46.354603 I | raft: ce2a822cea30bfca became follower at term 32
2016-11-02 18:16:46.354883 I | raft: newRaft ce2a822cea30bfca [peers: [ce2a822cea30bfca], term: 32, commit: 389115, applied: 380048, lastindex: 389115, lastterm: 32]
2016-11-02 18:16:46.411499 I | etcdserver: starting server... [version: 2.2.5, cluster version: 2.2]
2016-11-02 18:16:48.315589 I | raft: ce2a822cea30bfca is starting a new election at term 32
2016-11-02 18:16:48.316314 I | raft: ce2a822cea30bfca became candidate at term 33
2016-11-02 18:16:48.316721 I | raft: ce2a822cea30bfca received vote from ce2a822cea30bfca at term 33
2016-11-02 18:16:48.317363 I | raft: ce2a822cea30bfca became leader at term 33
2016-11-02 18:16:48.317726 I | raft: raft.node: ce2a822cea30bfca elected leader ce2a822cea30bfca at term 33
2016-11-02 18:16:48.325990 I | etcdserver: published {Name:default ClientURLs:[http://127.0.0.1:2379]} to cluster 7e27652122e8b2ae
2016-11-02 18:17:01.082697 N | osutil: received terminated signal, shutting down...
2016-11-02 18:17:07.634768 E | etcdhttp: got unexpected response error (etcdserver: request timed out)
2016-11-02 18:17:07.643182 E | etcdhttp: got unexpected response error (etcdserver: request timed out)
2016-11-02 18:17:07.783677 E | etcdhttp: got unexpected response error (etcdserver: request timed out)
2016-11-02 18:17:09.218169 E | etcdhttp: got unexpected response error (etcdserver: request timed out)
2016-11-02 18:17:09.824865 E | etcdhttp: got unexpected response error (etcdserver: server stopped)
2016-11-02 18:17:09.825649 E | etcdhttp: got unexpected response error (etcdserver: server stopped)
From @viroos on November 3, 2016 22:41
I managed to make it working with Hypriot 1.1.0, K8 1.4.3 and weave. In my case the issue was the same host name of master and node.
http://larmog.github.io/2016/10/28/installing-kubernetes-on-arm-with-kubeadm/ - this works.
From @nlamirault on November 4, 2016 7:53
I've got also this logs in /var/log/syslog :
Nov 04 07:43:17 jarvis kubelet[395]: E1104 07:43:17.873889 395 event.go:199] Server rejected event '&api.Event{TypeMeta:unversioned.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:api.ObjectMeta{Name:"etcd-jarvis.1482abde90f339e3", GenerateName:"", Namespace:"kube-system", SelfLink:"", UID:"", ResourceVersion:"270343", Generation:0, CreationTimestamp:unversioned.Time{Time:time.Time{sec:0, nsec:0, loc:(*time.Location)(nil)}}, DeletionTimestamp:(*unversioned.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]api.OwnerReference(nil), Finalizers:[]string(nil), ClusterName:""}, InvolvedObject:api.ObjectReference{Kind:"Pod", Namespace:"kube-system", Name:"etcd-jarvis", UID:"a19bb61234e539b2b8370ac597b940f0", APIVersion:"v1", ResourceVersion:"", FieldPath:"spec.containers{etcd}"}, Reason:"Unhealthy", Message:"Liveness probe failed: HTTP probe failed with statuscode: 503", Source:api.EventSource{Component:"kubelet", Host:"jarvis"}, FirstTimestamp:unversioned.Time{Time:time.Time{sec:63613529400, nsec:0, loc:(*time.Location)(0x3306808)}}, LastTimestamp:unversioned.Time{Time:time.Time{sec:63613842190, nsec:749080856, loc:(*time.Location)(0x3306808)}}, Count:10093, Type:"Warning"}': 'client: etcd cluster is unavailable or misconfigured' (will not retry!)
Nov 04 07:43:17 jarvis kubelet[395]: E1104 07:43:17.934346 395 kubelet_node_status.go:301] Error updating node status, will retry: client: etcd cluster is unavailable or misconfigured
Nov 04 07:53:05 jarvis kubelet[395]: E1104 07:53:05.507143 395 kubelet_node_status.go:301] Error updating node status, will retry: client: etcd cluster is unavailable or misconfigured
Nov 04 07:53:07 jarvis kubelet[395]: E1104 07:53:07.556922 395 kubelet_node_status.go:301] Error updating node status, will retry: Operation cannot be fulfilled on nodes "jarvis": the object has been modified; please apply your changes to the latest version and try again
I'm using k8s with only master. I've got no nodes.
From @nlamirault on November 4, 2016 16:16
Controller manager have some errors like that :
I1104 15:31:25.389881 1 reflector.go:284] pkg/controller/volume/persistentvolume/controller_base.go:448: forcing resync
E1104 15:31:30.994090 1 leaderelection.go:317] err: client: etcd cluster is unavailable or misconfigured
E1104 15:31:31.086650 1 leaderelection.go:317] err: Operation cannot be fulfilled on endpoints "kube-controller-manager": the object has been modified; please apply your changes to the latest version and try again
I1104 15:31:31.090832 1 attach_detach_controller.go:520] processVolumesInUse for node "jarvis"
E1104 15:31:31.087267 1 event.go:258] Could not construct reference to: '&api.Endpoints{TypeMeta:unversioned.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:api.ObjectMeta{Name:"kube-controller-manager", GenerateName:"", Namespace:"kube-system", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:unversioned.Time{Time:time.Time{sec:0, nsec:0, loc:(*time.Location)(nil)}}, DeletionTimestamp:(*unversioned.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]api.OwnerReference(nil), Finalizers:[]string(nil), ClusterName:""}, Subsets:[]api.EndpointSubset(nil)}' due to: 'selfLink was empty, can't make reference'. Will not report event: 'Normal' '%v stopped leading' 'jarvis'
I1104 15:31:32.900385 1 leaderelection.go:232] failed to renew lease kube-system/kube-controller-manager
F1104 15:31:32.913336 1 controllermanager.go:195] leaderelection lost
From @brendandburns on November 14, 2016 6:29
I got these errors when I had a lousy SSD card. Switching to a higher performance card fixed things...
From @nlamirault on November 16, 2016 12:40
@brendandburns i will try that.
I'm closing this, as everything we can do to provide kubeadm on arm was done from the beginning (the initial v1.4 release), and from that it was enhanced a lot in the second revision, so it should be really smooth now.
From @derailed on October 26, 2016 17:23
Is this a request for help? (If yes, you should use our troubleshooting guide and community support channels, see http://kubernetes.io/docs/troubleshooting/.):
No
What keywords did you search in Kubernetes issues before filing this one? (If you have found any duplicates, you should instead reply there.):
kubeadm arm
Is this a BUG REPORT or FEATURE REQUEST? (choose one):
Bug
Kubernetes version (use
kubectl version
):Environment:
Raspberry Pi 3
HypriotOS v 1.0
uname -a
):Linux m10 4.4.15-hypriotos-v7+ #1 SMP PREEMPT Mon Jul 25 08:46:52 UTC 2016 armv7l GNU/Linux
What happened:
Following kubeadm installation docs. I've installed the prereqs and proceeded with the init as follows:
kubeadm init --use-kubernetes-version v1.4.1 --pod-network-cidr=10.244.0.0/16
The command is stuck on:
Looking at docker on this node expecting to see images being installed for etcd, api-server, etc.. but docker images reports nothing. Guessing kubeadm is somehow no able to connect with local docker daemon? or the default Hypriot docker configuration is not jiving with kubeadm??
What you expected to happen:
kubeadm init to complete successfully
How to reproduce it (as minimally and precisely as possible):
o Download Hypriot v1.0 image and flash on SD card o Boot rp with SD card o curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add - o cat < /etc/apt/sources.list.d/kubernetes.list
deb http://apt.kubernetes.io/ kubernetes-xenial main
EOF
o apt-get update
o apt-get install -y kubelet kubeadm kubectl kubernetes-cni
o kubeadm init --use-kubernetes-version v1.4.1 --pod-network-cidr=10.244.0.0/16
Anything else do we need to know:
Looking at /etc/kubernetes/manifests shows:
etcd.json kube-apiserver.json kube-controller-manager.json kube-scheduler.json
Inspecting journalctl on kubelet shows:
Copied from original issue: kubernetes/kubernetes#35643