Closed PsychoSid closed 3 years ago
Interesting, looks like we might need to add a timeout between cluster creation and attempting to add to the hub. /cc @bharathkkb
Hi @PsychoSid What gcloud version are you on?
v305 which is the latest I believe
I looked at this again this morning when bringing up my cluster. It seemingly does need a wait as the cluster is "RECONCILING" if I wait until it's in "RUNNING" before rerunning the apply then it goes through just fine.
Thanks.
@PsychoSid I think it makes sense to wait for the cluster to reconcile before we proceed. We can probably target this once we have https://github.com/terraform-google-modules/terraform-google-kubernetes-engine/pull/611. I have also noticed that with smaller cluster sizes ASM installs tends to force a master reconciliation which might be why it enters in RECONCILING
before creating the hub membership.
I tried a apply - destroy - apply cycle with this example which seemed to work, but happy to debug further if you can provide your config.
Thanks it's a 100% reproducible for me with my config/setup (it didn't happen with v0.10 - although v0.11 fixed my destroy issue !) I haven't included the .tfvars, or the backend type stuff here. issue626.txt
Thanks
@PsychoSid We encountered something similar with ACM today where master was unavailable for around ~1m after the CRDs where applied producing a very similar dial tcp endpoint:443: connect: connection refused
error.
I think having some kind of precondition check to make sure endpoint is available and if not a retry mechanism with a backoff might be the best approach. Happy to hear any thoughts or other ideas.
Hi @PsychoSid I wanted to follow up regarding this. We had a regression fixed by https://github.com/terraform-google-modules/terraform-google-kubernetes-engine/pull/669 where we were not waiting for cluster to be ready, so I wanted to confirm if you were still seeing this with the latest on main.
Hi @bharathkkb I haven't as I tend to use the module registry paths for sources. But will do. Many thanks.
Closing this out as it should be fixed by #669 Feel free to reopen if needed
Every night I tear down my deployment and bring it up the following day (the names remain). I mention this as it might be due to previous credentials
Every day since the update to v0.11 modules the ASM module doesn't complete correctly.
The initial run fails with:-
An immediate attempt to re-apply also fails:-
If I then run gcloud..get-credentials and re-apply everything is good.
Pretty sure the update doc I followed. Any ideas please, thanks.