rancher / terraform-provider-rancher2

Terraform Rancher2 provider
https://www.terraform.io/docs/providers/rancher2/
Mozilla Public License 2.0
253 stars 216 forks source link

Updating default_system_registry behavior at rancher2_app_v2 resource #1265

Closed nicholasSUSE closed 7 months ago

nicholasSUSE commented 7 months ago

Issue:

https://github.com/rancher/terraform-provider-rancher2/issues/1187

Problem

The terraform resource rancher2_app_v2 had no Argument for setting up a Custom System Registry, limiting clients to install an app only from a previously configured SystemDefaultRegistry.

There was in place a computed attribute named default_system_registry.

Solution

Transform the computed attribute default_system_registry into an optional argument for the resource too.

Testing

The reproducing steps in the Github Issue are very hard and take a long time to execute.

In order to test without needing to set all the infrastructure (specially the private registry with all the images):

Testing With Terraform:

  1. Apply 1 AWS Instance + Rancher
  2. Apply 1 AWS Instance + Import RKE downstream cluster
  3. Use resource rancher2_app_v2 to create a new app installation passing the recently created system_default_registry argument pointing to a random string pretending to be the private registry DNS.
  4. Check if the Charts begins the installation under Apps.
  5. Check if the related system image for running the chart fails because it points out to the random string we passed.

Results when trying to install CIS Benchmark passing a random string as the value: image


Engineering Testing

Manual Testing

Part of the terraform code to execute the tests:


resource "rancher2_cluster" "bug_cluster" {
    provider = rancher2.BUG

    depends_on = [
        aws_instance.bug_node
    ]

    name = "${var.prefix}-bug-cluster"
    enable_cluster_alerting = false
    enable_cluster_monitoring = false

    rke_config {
      network {
        plugin = "canal"
      }
      enable_cri_dockerd    = true
      kubernetes_version = "v1.25.9-rancher2-1"
      ignore_docker_version = true
    }
}

# Create Rancher-CIS-Benchark App from Helm chart
resource "rancher2_app_v2" "cis_benchmark_bug_cluster" {
  provider = rancher2.BUG

  depends_on = [
    rancher2_cluster_sync.bug_sync
  ]

  cluster_id = rancher2_cluster_sync.bug_sync.cluster_id
  name = "rancher-cis-benchmark"
  namespace = "cis-operator-system"
  repo_name = "rancher-charts"
  chart_name = "rancher-cis-benchmark"
  cleanup_on_fail = true
  wait = false

  # comment/uncomment to reproduce/disable the bug
  # system_default_registry = "YYYYZ"
  timeouts {
    create = "5m"
    update = "5m"
    delete = "5m"
  }
}

Automated Testing

QA Testing Considerations

Regressions Considerations

nicholasSUSE commented 7 months ago

@Jono-SUSE-Rancher @snasovich

Hi, I need reviews on this PR however, I don't have permission to add reviewers. Please, can you help with that?

Also, there is a failing automated test but it has nothing to do with PR code changes and I can't restart the drone build, how should I proceed?

Jono-SUSE-Rancher commented 7 months ago

Hey @nicholasSUSE - I've added Andy, Sergey, and Bruno - whichever two get to take a look first.

nicholasSUSE commented 7 months ago

I'd update the tf docs cluster_v2.md to reflect the new functionality of system_default_registry now and make sure to test it having an empty value/not being set which would cause tf to use the global system registry within Rancher and make sure that still works. Lgtm

Thank you for your review. I have updated the documentation in a new commit. I have tested the empty value for system_default_registry

nicholasSUSE commented 7 months ago

@rohitsakala

I have tested both scenarios:

Setting the system_default_registry variable

  system_default_registry = "${aws_instance.default_registry.public_dns}:5000"

And not setting.

The resourceRancher2AppV2Read function, returns the right value for system_default_registry. The d.Get("system_default_registry") method, checks the state from terraform and not the actual configuration from Rancher or Downstream cluster.

This way, whatever override we do with the new variable behavior will output the right value.