rancher / quickstart

381 stars 336 forks source link

Azure and GCP quick start failes after creating first VM #59

Closed JohnOnTheWeb closed 4 years ago

JohnOnTheWeb commented 4 years ago

Attempting to follow the quickstart information presented here https://rancher.com/docs/rancher/v2.x/en/quick-start-guide/deployment/microsoft-azure-qs/

for deploying to an azure environment or GCP

running terraform apply --auto-approve proceeds to create a resource group, IPs, and a VM but process halts with the following.

azurerm_linux_virtual_machine.rancher_server: Creation complete after 1m49s [id=/subscriptions/2458118e-7ade-4a51-aa5e-c4bcaed7dc97/resourceGroups/quickstart-rancher-quickstart/providers/Microsoft.Compute/virtualMachines/quickstart-rancher-server] module.rancher_common.rke_cluster.rancher_cluster: Creating...

Error: Cluster must have at least one etcd plane host: please specify one or more etcd in cluster config

Was able to ssh into the VM created, no docker images or failed containers present.

The RKE Outputs were the following

============= RKE outputs ==============

[info] Tearing down Kubernetes cluster [info] [dialer] Setup tunnel for host [40.71.88.240] [warning] Failed to set up SSH tunneling for host [40.71.88.240]: Can't initiate NewClient: protocol not available [warning] Removing host [40.71.88.240] from node lists [info] Initiating Kubernetes cluster [info] [dialer] Setup tunnel for host [40.71.88.240] [warning] Failed to set up SSH tunneling for host [40.71.88.240]: Can't initiate NewClient: protocol not available [warning] Removing host [40.71.88.240] from node lists [warning] [state] can't fetch legacy cluster state from Kubernetes [info] [certificates] Generating CA kubernetes certificates [info] [certificates] Generating Kubernetes API server aggregation layer requestheader client CA certificates [info] [certificates] Generating Kubernetes API server proxy client certificates [info] [certificates] Generating Kubernetes API server certificates [info] [certificates] Generating Service account token key [info] [certificates] Generating Kube Scheduler certificates [info] [certificates] Generating Node certificate [info] [certificates] Generating admin certificates and kubeconfig [info] [certificates] Generating Kube Controller certificates [info] [certificates] Generating Kube Proxy certificates [info] Successfully Deployed state file at [C:\Projects\NTTData\Rancher\QuickStart\azure\terraform-provider-rke-223987894/cluster.rkestate] [info] Building Kubernetes cluster

========================================

on ..\rancher-common\rke.tf line 4, in resource "rke_cluster" "rancher_cluster": 4: resource "rke_cluster" "rancher_cluster" {

nikkelma commented 4 years ago

A few questions to help figure out what went wrong ...

  1. What OS was used for terraform apply?
  2. Did you specify an SSH key file other than the default?
  3. If you didn't specify an SSH key file, is there an SSH key at ~/.ssh/id_rsa?
JohnOnTheWeb commented 4 years ago

1) Windows 10 2) yes I created a key and modified the path 3) there is a key at that location

Ran it both ways for both providers with same result. Created key rancher1 and placed in same directory as terraform.tfvars as well as used the default key

Is there a way I can SSH into the VM and restart the process?

nikkelma commented 4 years ago

Thank you for this info! This error is caused by SSH initiated through RKE failing, so there's no manual way to reconcile this.

I found a few relevant issues:

The main workaround I can see is running the commands inside a shell using Windows Subsystem for Linux, the unix socket will succeed in that context while it will fail in a pure Windows environment.

JohnOnTheWeb commented 4 years ago

Running on WSL does seem to work...closing issue

nikkelma commented 4 years ago

Version 1.0.0 of terraform-provider-rke is now being used, and I've confirmed the quickstart can be successfully deployed from a native Windows environment - this should fix the root cause without a workaround!