rancher / quickstart

380 stars 335 forks source link

Vagrant setup does not work #153

Open daitangio opened 3 years ago

daitangio commented 3 years ago

Vagrant master provisioning fail at the last step

server-01: + docker run --rm --net=host appropriate/curl -s https://127.0.0.1/v3/clusterregistrationtoken -H 'content-type: application/json' -H 'Authorization: Bearer ' --data-binary '{"type":"clusterRegistrationToken","clusterId":""}' --insecure

The SSH command responded with a non-zero exit status. Vagrant assumes that this means the command failed. The output for this command should be in the log above. Please read the output to determine what went wrong.

On the serve-01 side the rancher docker instance (rancher/rancher:v2.5.3) fails with

I0123 18:03:55.852637 22 request.go:621] Throttling request took 1.271102985s, request: GET:https://127.0.0.1:6444/apis/coordination.k8s.io/v1beta1?timeout=32s F0123 18:03:56.248445 22 controllermanager.go:213] leaderelection lost E0123 18:04:17.134339 7 leaderelection.go:357] Failed to update lock: Put "https://127.0.0.1:6443/api/v1/namespaces/kube-system/configmaps/cattle-controllers?timeout=15m0s": context deadline exceeded I0123 18:04:17.134894 7 leaderelection.go:278] failed to renew lease kube-system/cattle-controllers: timed out waiting for the condition E0123 18:04:17.135112 7 leaderelection.go:297] Failed to release lock: resource name may not be empty 2021/01/23 18:04:17 [FATAL] leaderelection lost for cattle-controllers

The problem seems even bigger because the docker instance on server-01 keep failing with

I0123 18:09:53.611369 23 trace.go:116] Trace[609173925]: "GuaranteedUpdate etcd3" type:*coordination.Lease (started: 2021-01-23 18:09:48.588528086 +0000 UTC m=+313.151141630) (total time: 5.022763358s): Trace[609173925]: [2.69861839s] [2.698547532s] Transaction prepared Trace[609173925]: [5.022677334s] [2.324058944s] Transaction committed 2021/01/23 18:09:59 [ERROR] Failed to install system chart fleet-crd: Post "https://127.0.0.1:6443/apis/rbac.authorization.k8s.io/v1/clusterroles?timeout=15m0s": read tcp 127.0.0.1:32992->127.0.0.1:6443: read: connection reset by peer E0123 18:10:01.591515 7 leaderelection.go:357] Failed to update lock: Put "https://127.0.0.1:6443/api/v1/namespaces/kube-system/configmaps/cattle-controllers?timeout=15m0s": read tcp 127.0.0.1:33142->127.0.0.1:6443: read: connection reset by peer 2021/01/23 18:10:01 [FATAL] k3s exited with: exit status 255

I also tried specifying k8s_version: "v1.19.4-rancher1-1" inside config.yaml My machine could be a little slow but has plenty of RAM (Centrino with 8Gb). How can we fix it?

vrubiolo commented 3 years ago

I have encountered the same issue as you and have found that destroying + bringing up the machines again allows to work around the issue.

jaroslav-muller commented 3 years ago

I had the same issue. I thought that maybe destroying and recreating server-01 would be enough, but that didn't help. Then I did complete vagrant destroy and up again and now it seems to work.