rancher / terraform-provider-rancher2

Terraform Rancher2 provider
https://www.terraform.io/docs/providers/rancher2/
Mozilla Public License 2.0
263 stars 228 forks source link

Terraform build fails with Rancher dependency v2.7 #1052

Closed HarrisonWAffel closed 1 year ago

HarrisonWAffel commented 1 year ago

Terraform fails to build when using Rancher commits from 2.7, but is able to build when using commits from 2.6. It seems some types were moved around in 2.7, breaking the following structure files:

structure_cluster_logging.go structure_cluster_scan.go structure_logging_custom_target_config.go

Builds will fail with the following error

# github.com/rancher/terraform-provider-rancher2/rancher2
rancher2/structure_cluster_logging.go:12:73: undefined: client.ClusterLogging
rancher2/structure_cluster_logging.go:153:71: undefined: client.ClusterLogging
rancher2/structure_cluster_scan.go:10:55: undefined: client.CisScanConfig
rancher2/structure_cluster_scan.go:31:52: undefined: client.ClusterScanConfig
rancher2/structure_cluster_scan.go:42:70: undefined: client.ClusterScan
rancher2/structure_cluster_scan.go:78:68: undefined: client.CisScanConfig
rancher2/structure_cluster_scan.go:108:65: undefined: client.ClusterScanConfig
rancher2/structure_cluster_scan.go:121:67: undefined: client.ClusterScan
rancher2/structure_logging_custom_target_config.go:9:60: undefined: client.CustomTargetConfig
rancher2/structure_logging_custom_target_config.go:42:74: undefined: client.CustomTargetConfig
rancher2/structure_logging_custom_target_config.go:42:74: too many errors

This error seems to be due to the fact that "github.com/rancher/rancher/pkg/client/generated/management/v3" package no longer contains the following types

CustomTargetConfig, ClusterLogging, ClusterScan, CisScanConfig, and ClusterScanConfig

The resolution of this issue may be as simple as fixing import paths, but could also be more involved depending on the changes that these types have undergone in 2.7.

HarrisonWAffel commented 1 year ago

Root causes:

https://github.com/rancher/rancher/pull/39656

https://github.com/rancher/rancher/issues/37318

a-blender commented 1 year ago

Root cause

The root cause of this issue is that Logging and cis v1 scan code support were removed in rancher 2.7 but not in the TF rancher2 provider so the provider is trying to reference types that don't exist.

Design discussion summary

Rancher v2.6 still supports logging/cis v1 scan and if we remove support in TF to fix 2.7 that will simultaneously break 2.6. This is a tech debt issue. We discussed some alternative options but concluded that we have to branch the TF provider to resolve this and prevent future debt.

For customers who are running both Rancher 2.6 and 2.7 instances, @MbolotSuse pointed out you will need two separate TF directories with configuration and state files to manage each instance, instead of just updating the TF provider version and re-running terraform init. The latter will cause state file conflicts. This may be confusing and should be added to the TF docs to hep customers.

Version schema

Major Version alignment (2.7.x -> 3.x): Pros:

In comparison, Minor Version alignment (2.0.x for 2.6.10, 2.1.0 matches 2.7.2 of the provider) does not work for 2.1.x (2.7). We will likely be introducing new features (PSA), which would require a minor version increase. But we won't be able to do that with this schema option since the minor version is locked.

Minor Version alignment works for 2.6 but not for 2.7. Major Version alignment will require a compatibility matrix but works for both and has more flexibility for OOB releases.

Design plan

Design plan

  1. Branch TF rancher2 provider into separate release lines release/v2.6 before Harrison's changes for rancher 2.6 and master for rancher 2.7.
  2. Tag 2.0.0 for rancher v2.6.10 on release/v2 and 3.0.0 on master for upcoming rancher v2.7.2. This uses semver and aligns the TF major version with the Rancher minor version while still allowing for OOB flexibility. No need to backport at this time.
  3. Submit PR to remove logging and cis v1 support from 2.7 so build errors are resolved (testing is unblocked)
  4. Submit PR to update terraform release notes + add compatibility matrix and instructions on using multiple TF versions to the README
  5. Inform team 3 that there are still controller API definitions in rancher for logging and cis scan support. Ask why those are still there and if they should be removed because it was a possible oversight. Some of those files may be managed by other teams now, they would know.

Future PRs

a-blender commented 1 year ago

Also from @snasovich: I’ve got a soft “OK” from product on maintaining separate release lines for TF provider for minor Rancher release lines. We carried over some more high-level discussions about the whole “support story” for Rancher TF provider to next week, but it should not affect this decision.

a-blender commented 1 year ago

TF rancher2 provider 2.0.0 and 3.0.0 releases are targeted for together/after upcoming Rancher feature releases.

a-blender commented 1 year ago

Testing template

Root cause

The root cause of this issue is that Logging and cis v1 scan code support were removed in rancher 2.7 but not in the TF rancher2 provider so the provider is trying to reference types that don't exist.

What was fixed, or what changes have occurred

I have branched the TF provider, per this discussion into release/v2 for rancher 2.6 and still master for rancher 2.7. To fix the TF build, I am doing the following in this PR

Areas or cases that should be tested

TF rancher2 cluster or any cluster that can run k8s 1.25 and could have logging or cis v1 enabled previously (I chose RKE on EC2 nodes in this case).

Note for QA: This can only be tested when Terraform 1.25.x is released.

main.tf

``` terraform { required_providers { rancher2 = { source = "terraform/rancher2" version = "1.25.0" } } } provider "rancher2" { api_url = var.rancher_api_url token_key = var.rancher_admin_bearer_token insecure = true } data "rancher2_cloud_credential" "rancher2_cloud_credential" { name = var.cloud_credential_name } resource "rancher2_cluster" "rancher2_cluster" { name = var.cluster_name rke_config { kubernetes_version = "v1.25.5-rancher1-1" network { plugin = var.network_plugin } } } resource "rancher2_node_template" "rancher2_node_template" { name = var.node_template_name amazonec2_config { access_key = var.aws_access_key secret_key = var.aws_secret_key region = var.aws_region ami = var.aws_ami security_group = [var.aws_security_group_name] subnet_id = var.aws_subnet_id vpc_id = var.aws_vpc_id zone = var.aws_zone_letter root_size = var.aws_root_size instance_type = var.aws_instance_type } } resource "rancher2_node_pool" "pool1" { cluster_id = rancher2_cluster.rancher2_cluster.id name = "pool1" hostname_prefix = "tf-pool1-" node_template_id = rancher2_node_template.rancher2_node_template.id quantity = 1 control_plane = false etcd = true worker = false } resource "rancher2_node_pool" "pool2" { cluster_id = rancher2_cluster.rancher2_cluster.id name = "pool2" hostname_prefix = "tf-pool2-" node_template_id = rancher2_node_template.rancher2_node_template.id quantity = 1 control_plane = true etcd = false worker = false } resource "rancher2_node_pool" "pool3" { cluster_id = rancher2_cluster.rancher2_cluster.id name = "pool3" hostname_prefix = "tf-pool3-" node_template_id = rancher2_node_template.rancher2_node_template.id quantity = 1 control_plane = false etcd = false worker = true } ```

What areas could experience regressions ?

I don't think there's a likely chance of regressions here. Code that was blocking a TF build has been removed. Some of those types still exist in rancher, but if they were added back into the TF provider that wouldn't be a regression it would be reinstating a feature.

Are the repro steps accurate/minimal ?

Yes.

a-blender commented 1 year ago

Blocked -- waiting on Terraform 3.0.0 for Rancher v2.7.x.

a-blender commented 1 year ago

Ping for QA: This is ready to test using Terraform rancher2 v3.0.0-rc1. Please setup local testing on the rc version of the provider with this command

./setup-provider.sh rancher2 3.0.0-rc1
sowmyav27 commented 1 year ago

@Sahota1225 @Anna-Blendermann If this is a tech-debt issue, does this need QA Validation?

a-blender commented 1 year ago

I'd say probably not at this point. If any testing any of the other TF issues with 3.0.0-rc1 fails with build errors related to scan or cis v1 logging this can be reopened.