Open iTaybb opened 3 years ago
It would seem that when rancher is bootstrapped, it takes some time for the rancher RKE images to become ready, so if you're using terraform to install the rancher instance, bootstrap it, and then attempt to create a cluster, the RKE images might not be ready yet.
By running curl -sSku $TOKEN https://$RANCHER_IP/v3/rkek8ssystemimages | jq -c '.pagination.total'
right after bootstraping I can see:
10:06:19 rancher2_bootstrap.admin (local-exec): 122
10:06:22 rancher2_bootstrap.admin (local-exec): 143
10:06:24 rancher2_bootstrap.admin (local-exec): 163
10:06:27 rancher2_bootstrap.admin (local-exec): 168
10:06:29 rancher2_bootstrap.admin (local-exec): 168
10:06:30 rancher2_bootstrap.admin (local-exec): 168
which shows that the images are still loading.
I suggest that rancher2_bootstrap
should check that all the rkek8ssystemimages
are loaded through the API.
As a workaround, you can probably run some hacky script like this:
#!/bin/bash
LAST_LAST_COUNT=-1
LAST_COUNT=-1
while true; do
COUNT=$(curl -sSku $TOKEN https://$RANCHER_IP/v3/rkek8ssystemimages | jq -c '.pagination.total')
echo "$COUNT RKE images loaded."
[[ $COUNT>0 && "$COUNT" == "$LAST_COUNT" && "$COUNT" == "$LAST_LAST_COUNT" ]] && exit 0
LAST_LAST_COUNT=$LAST_COUNT
LAST_COUNT=$COUNT
sleep 1
done
@iTaybb , yes, it seems a race condition between bootstrap is done and the local
cluster is active. Fix added at PR #679, rancher2_bootstrap
resource will wait until local
cluster is active
PR https://github.com/rancher/terraform-provider-rancher2/pull/679 is already merged. The fix will be available at next tf provider release.
Please, reopen issue if needed.
@rawmind0 Unfortunately this is still/again happening, see https://github.com/rancher/quickstart/issues/196. I can also reproduce this every 10th time or so.
The issue is happening again in rancher 2.6.3 and terraform provider v1.22.2.
This may or may not work for you, but my fix was to do the following:
# Initialize Rancher server
resource "rancher2_bootstrap" "admin" {
depends_on = [
helm_release.rancher_server
]
provider = rancher2.bootstrap
password = var.admin_password
telemetry = true
}
locals {
rke_network_plugin = "canal"
rke_network_options = null
}
Then, add this:
resource "time_sleep" "wait_60_seconds" {
depends_on = [rancher2_bootstrap.admin]
create_duration = "60s"
}
and on the resource declaration for the workload:
# Create custom managed cluster for amf
resource "rancher2_cluster" "amf_workload" {
depends_on = [time_sleep.wait_60_seconds]
I'm trying to deploy RKE
v1.20.6-rancher1-1
with rancher v2.5.8, which should be supported by the release notes.I'm getting the following error:
Weirdly enough, after re-running the terraform plan, it runs fine, so somehow the
v1.20.6-rancher1-1
version is approved after some time.Might be a race condition of some kind? Maybe rancher is not fully available yet?