Closed pandalec closed 2 years ago
Hi @parsifallo are u using the master-head version? the v2.6.5 works fine for me and we only see this issue with the master-head version because the backend API has made some changes to the cloud-provider part, and it will be resolved with the upcoming UI changes.
Hi @guangbochen ! I am using Docker image "rancher/rancher:latest", but I see there's a new version. Will give it try now
The issue still exists in the latest version but ur first issue description has mentioned it was Rancher version: v2.6.5
, just want to make sure it wasn't Rancher v2.6.5, thanks.
Strange, I copied the version number from the lower left of the web GUI. So I tested explicitly Docker image rancher/rancher:v2.6.6. I can add the Harvester cluster (shown as active) but if I try to create a cluster rancher shows this error message during creation: clusters.management.cattle.io "c-m-mpm4rd4l" not found
, same behavior with v2.6.5. Now I started a fresh rancher installation with v2.6.5, added Harvester but I get the same errors:
2022/07/01 07:23:57 [INFO] [planner] rkecluster fleet-default/cluster: waiting: waiting for viable init node
2022/07/01 07:23:57 [INFO] [planner] rkecluster fleet-default/cluster: waiting: waiting for viable init node
2022/07/01 07:23:57 [INFO] [planner] rkecluster fleet-default/cluster: waiting: waiting for viable init node
2022/07/01 07:23:57 [INFO] [planner] rkecluster fleet-default/cluster: waiting: waiting for viable init node
2022/07/01 07:23:57 [INFO] [planner] rkecluster fleet-default/cluster: waiting: waiting for viable init node
2022/07/01 07:23:57 [INFO] [planner] rkecluster fleet-default/cluster: waiting: waiting for viable init node
2022/07/01 07:23:59 [INFO] [planner] rkecluster fleet-default/cluster: waiting: waiting for viable init node
2022/07/01 07:24:05 [INFO] [planner] rkecluster fleet-default/cluster: waiting: waiting for viable init node
2022/07/01 07:26:20 [INFO] [planner] rkecluster fleet-default/cluster: waiting: waiting for viable init node
2022/07/01 07:26:20 [INFO] [planner] rkecluster fleet-default/cluster: waiting: waiting for viable init node
2022/07/01 07:26:20 [INFO] [planner] rkecluster fleet-default/cluster: waiting: waiting for viable init node
2022/07/01 07:26:20 [INFO] [planner] rkecluster fleet-default/cluster: waiting: waiting for viable init node
2022/07/01 07:26:21 [INFO] [planner] rkecluster fleet-default/cluster: waiting: waiting for viable init node
2022/07/01 07:26:21 [INFO] [MachineProvision] Failed to create infrastructure fleet-default/cluster-pool1-303e2776-9dcqh for machine cluster-pool1-6568745bcb-w2txf, deleting and recreating...
2022/07/01 07:26:21 [INFO] [MachineProvision] Failed to create infrastructure fleet-default/cluster-pool1-303e2776-b6s6v for machine cluster-pool1-6568745bcb-4ph9s, deleting and recreating...
2022/07/01 07:26:21 [INFO] [MachineProvision] Failed to create infrastructure fleet-default/cluster-pool1-303e2776-vmxsq for machine cluster-pool1-6568745bcb-8wlb7, deleting and recreating...
2022/07/01 07:26:21 [INFO] [planner] rkecluster fleet-default/cluster: waiting: waiting for viable init node
2022/07/01 07:26:21 [INFO] [planner] rkecluster fleet-default/cluster: waiting: waiting for viable init node
2022/07/01 07:26:21 [INFO] [planner] rkecluster fleet-default/cluster: waiting: waiting for viable init node
2022/07/01 07:26:21 [INFO] [planner] rkecluster fleet-default/cluster: waiting: waiting for viable init node
2022/07/01 07:26:21 [INFO] [planner] rkecluster fleet-default/cluster: waiting: waiting for viable init node
2022/07/01 07:26:21 [INFO] [planner] rkecluster fleet-default/cluster: waiting: waiting for viable init node
2022/07/01 07:26:21 [ERROR] error syncing '_all_': handler user-controllers-controller: failed to start user controllers for cluster c-m-sqwqwv7m: ClusterUnavailable 503: cluster not found, requeuing
2022/07/01 07:26:21 [INFO] [planner] rkecluster fleet-default/cluster: waiting: waiting for viable init node
2022/07/01 07:26:21 [ERROR] error syncing '_all_': handler user-controllers-controller: failed to start user controllers for cluster c-m-sqwqwv7m: ClusterUnavailable 503: cluster not found, requeuing
2022/07/01 07:26:21 [INFO] [planner] rkecluster fleet-default/cluster: waiting: waiting for viable init node
2022/07/01 07:26:21 [INFO] [planner] rkecluster fleet-default/cluster: waiting: waiting for viable init node
2022/07/01 07:26:21 [ERROR] error syncing '_all_': handler user-controllers-controller: failed to start user controllers for cluster c-m-sqwqwv7m: ClusterUnavailable 503: cluster not found, requeuing
2022/07/01 07:26:21 [INFO] [planner] rkecluster fleet-default/cluster: waiting: waiting for viable init node
2022/07/01 07:26:21 [ERROR] error syncing '_all_': handler user-controllers-controller: failed to start user controllers for cluster c-m-sqwqwv7m: ClusterUnavailable 503: cluster not found, requeuing
2022/07/01 07:26:21 [ERROR] error syncing '_all_': handler user-controllers-controller: failed to start user controllers for cluster c-m-sqwqwv7m: ClusterUnavailable 503: cluster not found, requeuing
2022/07/01 07:26:21 [INFO] [planner] rkecluster fleet-default/cluster: waiting: waiting for viable init node
2022/07/01 07:26:21 [INFO] [planner] rkecluster fleet-default/cluster: waiting: waiting for viable init node
2022/07/01 07:26:22 [ERROR] error syncing '_all_': handler user-controllers-controller: failed to start user controllers for cluster c-m-sqwqwv7m: ClusterUnavailable 503: cluster not found, requeuing
2022/07/01 07:26:22 [ERROR] error syncing '_all_': handler user-controllers-controller: failed to start user controllers for cluster c-m-sqwqwv7m: ClusterUnavailable 503: cluster not found, requeuing
2022/07/01 07:26:23 [ERROR] error syncing '_all_': handler user-controllers-controller: failed to start user controllers for cluster c-m-sqwqwv7m: ClusterUnavailable 503: cluster not found, requeuing
2022/07/01 07:26:26 [ERROR] error syncing '_all_': handler user-controllers-controller: failed to start user controllers for cluster c-m-sqwqwv7m: ClusterUnavailable 503: cluster not found, requeuing
2022/07/01 07:26:31 [ERROR] error syncing '_all_': handler user-controllers-controller: failed to start user controllers for cluster c-m-sqwqwv7m: ClusterUnavailable 503: cluster not found, requeuing
2022/07/01 07:26:35 [INFO] [planner] rkecluster fleet-default/cluster: waiting: waiting for viable init node
2022/07/01 07:26:41 [ERROR] error syncing '_all_': handler user-controllers-controller: failed to start user controllers for cluster c-m-sqwqwv7m: ClusterUnavailable 503: cluster not found, requeuing
Gonna try some other versions
Downgraded to 2.6.4, same error but this time it is shown in rancher gui:
provisioning bootstrap node(s) cluster1234-pool1-76b5d455d9-n8gtj: failed creating server (HarvesterMachine) in infrastructure provider: CreateError: Downloading driver from https://rancher.*/assets/docker-machine-driver-harvester
Doing /etc/rancher/ssl
ls: cannot access 'docker-machine-driver-*': No such file or directory
downloaded file failed sha256 checksum
download of driver from https://rancher.*/assets/docker-machine-driver-harvester failed, waiting for agent to check in and apply initial plan
Deploying RKE1 still works
Edit: Same error on v2.6.7-rc1. I don't get it. If there's something like a network error, why I am able to deploy RKE1 clusters but no RKE2 clusters? Is it possible to change https://rancher.*/assets/docker-machine-driver-harvester
to an internet endpoint which is available from rancher or so?
From inside the rancher Docker container:
rancher:/var/lib/rancher # curl -k https://rancher.*/assets/docker-machine-driver-harvester --output docker-machine-driver-harvester
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 36.9M 100 36.9M 0 0 770M 0 --:--:-- --:--:-- --:--:-- 770M
rancher:/var/lib/rancher #
I guess I found (part of) the issue. I started rancher with --network host
instead of defining ports. But it still not works with the current version. Needed to downgrade to 2.6.5 for getting it to work
I seem to have the same issue but using openstack driver and not harvester, perhaps it's something more global (RKE2 also, rancher 2.6.6)
2022/07/13 11:18:06 [ERROR] error syncing '_all_': handler user-controllers-controller: failed to start user controllers for cluster c-m-trbkdb5g: ClusterUnavailable 503: cluster not found, failed to start user controllers for cluster c-m-tctdhvdb: ClusterUnavailable 503: cluster not found, requeuing
2022/07/13 11:18:17 [INFO] [MachineProvision] Failed to create infrastructure fleet-default/test-rancher-3-pool1-15362df3-29wq4 for machine test-rancher-3-pool1-6c8fdfddf-4wqk4, deleting and recreating...
2022/07/13 11:18:17 [INFO] [MachineProvision] Failed to create infrastructure fleet-default/test-rancher-3-pool1-15362df3-29wq4 for machine test-rancher-3-pool1-6c8fdfddf-4wqk4, deleting and recreating...
This repository uses an automated workflow to automatically label issues which have not had any activity (commit/comment/label) for 60 days. This helps us manage the community issues better. If the issue is still relevant, please add a comment to the issue so the workflow can remove the label and we know it is still valid. If it is no longer relevant (or possibly fixed in the latest release), the workflow will automatically close the issue in 14 days. Thank you for your contributions.
Hello! I encountered the same problem on Rancher v2.8.5
. As a driver - Harvester
. Did you manage to solve it?
I'm also experiencing this when trying to create a RKE2 cluster using my custom node driver when running rancher locally on Docker Desktop. It seems to be an issue where download_driver.sh
is failing because of the SSL cert (https://github.com/rancher/machine/blob/9183b3ff738e16ece4391a2e6bcc8ef88889e8ae/package/download_driver.sh#L15).
That didn't seem to help
Hello! As far as I know, the harvester ignores the absence of ssl I was able to solve this problem for myself like this: When installing the rancher, you need to specify the IP address, not the domain name. I don't know exactly why, but when accessing the IP, the harvester manages to successfully log in.
@PAzter1101 I'm actually not using harvester, just using my own custom node driver but I can't seem to figure out how to make download_driver.sh
happy with the default self signed SSL cert for running rancher locally on Docker desktop. I would patch the script and add -k
just for local testing but I don't know where I can do that since it spins up a new container for the provisioning each time.
Rancher Server Setup
Information about the Cluster
User Information
Describe the bug
To Reproduce
Result
Cluster management pane shows:
Message inside nodes:
Docker logs from rancher
Expected Result If RKE1 deployment is successful, RKE2 should work too?
Screenshots
Additional context Switched because of this error to a real certificate powered by lets encrypt, same behavior. Machine and DNS is available inside network. Machines are not getting created on Harvester.