[BUG] OutdatedUbuntu image is used for DigitalOcean node templates by default

felipe-colussi commented 10 months ago

Note: This issue has been updated to be specifically about changing default DO image to match one in Rancher support matrix.

Rancher Server Setup

Rancher version: Any
Installation option (Docker install/Helm Chart):
- If Helm Chart, Kubernetes Cluster and version (RKE1, RKE2, k3s, EKS, etc):
Proxy/Cert Details:

Information about the Cluster

Kubernetes version: Any
Cluster Type (Local/Downstream): Downstream
- If downstream, what type of cluster? RKE
  I'm not able to create a RKE cluster using the default DO image as it uses Ubuntu 16.04.
  
  To Reproduce
  
  Create a DO RKE node using the following config (note image = "ubuntu-22-04-x64" is commented out so default image is used):
```
resource "rancher2_node_template" "rancher2_node_template_do" {
  name = "do-agent-customization-felipe"
  digitalocean_config {
    access_token  = var.do_token
    backups =false
    #image = "ubuntu-22-04-x64"
    ipv6 =false
    monitoring = false
    private_networking = false
    region = "nyc3"
    size = "s-2vcpu-4gb-intel"
    ssh_key_fingerprint=    ""
    ssh_port = "22"
    ssh_user =  "root"
    tags =""
    userdata= ""
  }
}
```
  Actual Result
  
  Ubuntu 16.04 image is attempted to be used which no longer exists in DO and so cluster fails on creation with:
  
  [tf-do-image.zip](https://github.com/rancher/terraform-provider-rancher2/files/12486407/tf-do-image.zip) Error creating machine: Error in driver during machine creation: POST https://api.digitalocean.com/v2/droplets: 422 You specified an invalid image for Droplet creation.
  
  Expected Result
  
  Ubuntu 22.04 image should be used by default to be in line with Rancher support matrix (e.g https://www.suse.com/suse-rancher/support-matrix/all-supported-versions/rancher-v2-7-9/). Note there is a known issue with provisioning on DO with Ubuntu 22.04 that is being looked into: https://github.com/rancher/rancher/issues/43586
  
  Screenshots
  
  Additional context

felipe-colussi commented 10 months ago

Validation Template

Root Cause

RKE Digital ocean Image was outdated.

What was fixed, or what change have occurred

The default value of the Digital Ocean Image was changed.

Areas or cases that should be tested

Creation of an RKE cluster on DO.

What areas could experience regressions

None.

Are the repro steps accurate/minimal?

Adding the hole TF file to be easier.
OBS: This TF is configured to use a local tf, probably you will need to change the provider.

do_test.zip

Josh-Diamond commented 8 months ago

Ticket #1215 - Test Results - :x: - RE-OPENED

Verified on Rancher v2.8-5b42ca50475398112661c80ac2b9070d5bacfb24-head with tfp-rancher2 v4.0.0-rc4:

Scenario	Test Case	Result
1.	Provision downstream DO RKE1 cluster, using default image in node template resource	:x:

Scenario 1 -

Fresh install of Rancher v2.8-head
Using the main.tf outlined below, provision a downstream DO RKE1 node driver cluster, using tfp-rancher2 v4.0.0-rc4
Veri-FAILED: Cluster fails to provision. error creating machine is encountered, the instance is torn down and a new one provisioned in its place, which then encounters the same error, and this continually repeats and loops.

Screenshot: image (39)

main.tf

terraform {
  required_providers {
    rancher2 = {
      source  = "terraform.local/local/rancher2"
      version = "4.0.0-rc4"
    }
  }
}

provider "rancher2" {
  api_url   = "<REDACTED>"
  token_key = "<REDACTED>"
  insecure  = true
}

resource "rancher2_node_template" "rancher2_node_template" {
  name = "qa-testing-jkeslar-plz-deletee"
  digitalocean_config {
    access_token  = "<REDACTED>"
    ipv6 = false
    monitoring = false
    private_networking = false
    region = "nyc3"
    size = "s-2vcpu-4gb-intel"
    ssh_key_fingerprint=    ""
    ssh_port = "22"
    ssh_user =  "root"
    tags =""
    userdata= ""
  }
}

resource "rancher2_cluster" "rancher2_cluster" {
  depends_on = [rancher2_node_template.rancher2_node_template]
  name       = "jkeslar-qa-testing-plz-deletee"
  rke_config {
    kubernetes_version = "v1.27.6-rancher1-1"
    network {
      plugin = "canal"
    }
  }
}

resource "rancher2_node_pool" "pool1" {
  depends_on       = [rancher2_cluster.rancher2_cluster]
  cluster_id       = rancher2_cluster.rancher2_cluster.id
  name             = "pool1"
  hostname_prefix  = "jkeslar-pool1-"
  node_template_id = rancher2_node_template.rancher2_node_template.id
  quantity         = 1
  control_plane    = true
  etcd             = true
  worker           = true
}

snasovich commented 7 months ago

Per offline discussions, this issue was updated to be specifically about switching default Ubuntu from 16.04 to 22.04, which should be validated as part of this issue. Note there is a known issue with Rancher (https://github.com/rancher/rancher/issues/43586) that will result in provisioning failing however new behavior is not making matters any worse than they were before the default switch and the issue will be addressed on the Rancher side not even requiring TF provider changes. Moving "To Test". FYI @Josh-Diamond

Josh-Diamond commented 7 months ago

Ticket #1215 - Test Results - ✅

Verified w/ tfp-rancher2 v4.0.0-rc5:

terraform plan

Verified - digital ocean node template image now defaults to ubuntu-22-04-x64; as expected

rancher / terraform-provider-rancher2