rancher / quickstart

382 stars 342 forks source link

Error: Timeout, Rancher is not ready: <nil> #64

Closed rrigoni closed 4 years ago

rrigoni commented 4 years ago

Getting this issue when applying terraform quickstar for DO.

Any thoughts?

➜  do git:(master) ✗ terraform apply --auto-approve
digitalocean_ssh_key.quickstart_ssh_key: Refreshing state... [id=27022084]
digitalocean_droplet.rancher_server: Refreshing state... [id=187249495]
module.rancher_common.data.helm_repository.rancher_stable: Refreshing state...
module.rancher_common.data.helm_repository.rancher_latest: Refreshing state...
module.rancher_common.data.helm_repository.jetstack: Refreshing state...

Error: Timeout, Rancher is not ready: <nil>

  on ../rancher-common/provider.tf line 44, in provider "rancher2":
  44: provider "rancher2" {
nikkelma commented 4 years ago

What version of the rancher2 provider are you using? #61 shows there are some problems surrounding 1.8.x, and as a result I have PR #62 in place to lock that provider to 1.7.3 until all changes are made.

rrigoni commented 4 years ago

I'm using version version = "~> 1.7" . Any plans on merging that PR?

rrigoni commented 4 years ago

Ok, I pulled your changes and now there is an issue with Helm downloading rancher.

module.rancher_common.helm_release.cert_manager: Still creating... [1m30s elapsed]
module.rancher_common.helm_release.cert_manager: Still creating... [1m40s elapsed]
module.rancher_common.helm_release.cert_manager: Creation complete after 1m50s [id=cert-manager]
module.rancher_common.helm_release.rancher_server: Creating...

Error: failed to download "rancher-latest/rancher" (hint: running `helm repo update` may help)

  on ../rancher-common/helm.tf line 19, in resource "helm_release" "rancher_server":
  19: resource "helm_release" "rancher_server" {
mathishaloy commented 4 years ago

Getting the same issue when applying terraform quickstar for AWS.

I can't contact those repositories :

Provided in rancher-common/data.tf

AlphaWong commented 4 years ago

Getting this issue when applying terraform quickstar for DO.

Any thoughts?

➜  do git:(master) ✗ terraform apply --auto-approve
digitalocean_ssh_key.quickstart_ssh_key: Refreshing state... [id=27022084]
digitalocean_droplet.rancher_server: Refreshing state... [id=187249495]
module.rancher_common.data.helm_repository.rancher_stable: Refreshing state...
module.rancher_common.data.helm_repository.rancher_latest: Refreshing state...
module.rancher_common.data.helm_repository.jetstack: Refreshing state...

Error: Timeout, Rancher is not ready: <nil>

  on ../rancher-common/provider.tf line 44, in provider "rancher2":
  44: provider "rancher2" {

same here. but I pull the master branch already.

omg. the PR still open. Let me pull ur PR #62 branch first. plz a lot

it works well. thx bro

rawmind0 commented 4 years ago

The problem with Rancher2 provider is just with version v1.8.2 due to low retries by default. v1.8.1 is working fine. Anyway, v1.8.3 is being released today and should address the issue.

nikkelma commented 4 years ago

Has this been resolved? Want to make sure the PRs that have been merged for terraform provider versions fix the issues.

dennybritz commented 4 years ago

Not sure if this is the same issue, but I am still getting timeouts, but for cert_manager:

module.rancher_common.helm_release.cert_manager: Still creating... [4m30s elapsed]
module.rancher_common.helm_release.cert_manager: Still creating... [4m40s elapsed]
module.rancher_common.helm_release.cert_manager: Still creating... [4m50s elapsed]
module.rancher_common.helm_release.cert_manager: Still creating... [5m0s elapsed] 
module.rancher_common.helm_release.cert_manager: Still creating... [5m10s elapsed]
module.rancher_common.helm_release.cert_manager: Still creating... [5m20s elapsed]
module.rancher_common.helm_release.cert_manager: Still creating... [5m30s elapsed]
module.rancher_common.helm_release.cert_manager: Still creating... [5m40s elapsed]

Error: timed out waiting for the condition                                        

  on ../rancher-common/helm.tf line 4, in resource "helm_release" "cert_manager": 
   4: resource "helm_release" "cert_manager" {      

This is on GCE.

EDIT: 1.7.3 worked, but not 1.8.3

nikkelma commented 4 years ago

To confirm, running the same terraform script around roughly the same time where the only difference was 1.7.3 vs 1.8.3 for the rancher terraform provider saw success with 1.7.3 but this timeout for 1.8.3? I ask because we've been seeing some issues installing cert-manager due to its dependency on quay.io and some downtime experienced by that registry, so wanted to make sure that wasn't the root cause.

dennybritz commented 4 years ago

To confirm, running the same terraform script around roughly the same time where the only difference was 1.7.3 vs 1.8.3 for the rancher terraform provider saw success with 1.7.3 but this timeout for 1.8.3?

Yep, that's what happened. quay.io downtime could've been the cause, but it seems unlikely because I tried 3-4 times with 1.8.3, and right after I switched to 1.7.3. and it worked.

nikkelma commented 4 years ago

Very interesting, thanks for this info - I'll dig deeper into why exactly this is happening when using GCE.

nikkelma commented 4 years ago

In many testing runs with GCE I wasn't able to reproduce this, especially since we've moved to Terraform version 0.13 and rancher2 provider version 1.10. Closing this issue for now, but please reopen if similar issues are seen with Terraform 0.13 and rancher2 >=1.10.

VishalSharma94 commented 3 years ago

@nikkelma I'm getting the same error as @dennybritz was facing

timeout issue for cert-manager

This is on AWS and rancher2version is 1.10.4

it's strange when I updated the version from 1.10.4 to 1.10.5, it worked.

todori438 commented 3 years ago

Had similar issue, but it was failing on the certificate part.

module.rancher_common.helm_release.cert_manager: Creating...

Error: failed to download "https://charts.jetstack.io/charts/cert-manager-v1.0.4.tgz" (hint: running helm repo update may help)

on ..\rancher-common\helm.tf line 4, in resource "helm_release" "cert_manager": 4: resource "helm_release" "cert_manager" {

Got it sorted with updating helm from 2.16 to 3.4.2 and updating the repository