rancher / terraform-provider-rancher2

Terraform Rancher2 provider
https://www.terraform.io/docs/providers/rancher2/
Mozilla Public License 2.0
263 stars 228 forks source link

eks-config-operator fails during eks node group provisioning and node template is empty #693

Closed wojtasool closed 2 years ago

wojtasool commented 3 years ago

During creation of new node_group with launch template, rancher does not see launch template in the UI and the eks-operator fails. When I edit cluster and manually point new node group to use specific launch_template then it works. Otherwise it is empty.

I define environment using terraform provider 1.15.1

Plan also shows that launch_template will be created and used within eks node_group.

This is the code i use:

`resource "rancher2_cluster" "EKS-custom" { depends_on = [var.rancher_dns, var.rancher_route_record_depends, var.rancher2_token, aws_security_group.eks_sg_allowall, aws_launch_template.eks_launch_template] name = var.cluster-name description = "EKS Kubernetes cluster"

eks_config_v2 { cloud_credential_id = rancher2_cloud_credential.eks_credential.id kubernetes_version = var.kubernetes_version region = var.region

dynamic node_groups {
  for_each = var.node_groups
  content {
    name                   = node_groups.value.name
    instance_type          = try(node_groups.value.spot, false) ? "" : node_groups.value.instance_type
    desired_size           = node_groups.value.desired_size
    max_size               = node_groups.value.max_size
    min_size               = node_groups.value.min_size
    ec2_ssh_key            = aws_key_pair.eks_key_pair.key_name
    request_spot_instances = try(node_groups.value.spot, "false")
    spot_instance_types    = [try(node_groups.value.spot, false) ? node_groups.value.instance_type : ""]
    gpu                    = try(node_groups.value.gpu, "false")
    disk_size              = (try(node_groups.value.disk_size, 0) > 0) ? node_groups.value.disk_size : "60"
    dynamic launch_template {
      for_each = (try(node_groups.value.longhorn_disk_size, 0) > 0) ? [1] : []
      content {
        id      = aws_launch_template.eks_launch_template[node_groups.value.name].id
        version = aws_launch_template.eks_launch_template[node_groups.value.name].latest_version
        name    = aws_launch_template.eks_launch_template[node_groups.value.name].name
      }
    }
    resource_tags = merge(
      {
        Creator = "Rancher-Terraform"
      },
      var.tags,
    )
    tags = merge(
      {
        Name    = "${var.cluster-name}-node"
        Creator = "Terraform"
      },
      var.tags,
    )
  }
}
public_access   = false
private_access  = true
security_groups = [aws_security_group.eks_sg_allowall.id]
subnets         = var.subnets
tags = merge(
  {
    Name    = "${var.cluster-name}"
    Creator = "Terraform"
  },
  var.tags,
)

} lifecycle { ignore_changes = [ eks_config_v2[0].public_access_sources, ] } } E0624 08:47:16.789014 1 runtime.go:78] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference) goroutine 296 [running]: k8s.io/apimachinery/pkg/util/runtime.logPanic(0x142c380, 0x1ffc440) /go/pkg/mod/k8s.io/apimachinery@v0.18.8/pkg/util/runtime/runtime.go:74 +0xa6 k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0) /go/pkg/mod/k8s.io/apimachinery@v0.18.8/pkg/util/runtime/runtime.go:48 +0x86 panic(0x142c380, 0x1ffc440) /usr/local/go/src/runtime/panic.go:965 +0x1b9 github.com/rancher/eks-operator/controller.createNodeGroup(0xc000c76580, 0xc000712a6b, 0xc0003b44a0, 0xc0003b4520, 0xc000712a60, 0xc0003b44e0, 0xc000c08180, 0xc0003b4460, 0xc000712a50, 0xc000712ad0, ...) /go/src/github.com/rancher/eks-operator/controller/nodegroup.go:267 +0x758 github.com/rancher/eks-operator/controller.(Handler).updateUpstreamClusterState(0xc000422340, 0xc0002ce000, 0xc000c76580, 0xc000b0aff0, 0x42, 0xc000521700, 0xc0000b2750, 0xc0000b2790, 0xc0000b27a8, 0x1, ...) /go/src/github.com/rancher/eks-operator/controller/eks-cluster-config-handler.go:1071 +0x28e8 github.com/rancher/eks-operator/controller.(Handler).checkAndUpdate(0xc000422340, 0xc00037a2c0, 0xc0000b2750, 0xc0000b2790, 0xc0000b27a8, 0xc0000b27a8, 0xc000712ba0, 0xc) /go/src/github.com/rancher/eks-operator/controller/eks-cluster-config-handler.go:312 +0xcfb github.com/rancher/eks-operator/controller.(Handler).OnEksConfigChanged(0xc000422340, 0xc000bbe3c0, 0x1a, 0xc00037a2c0, 0x0, 0x44842c, 0xc000432058) /go/src/github.com/rancher/eks-operator/controller/eks-cluster-config-handler.go:93 +0x211 github.com/rancher/eks-operator/controller.(Handler).recordError.func1(0xc000bbe3c0, 0x1a, 0xc00037a2c0, 0xc0003d4000, 0xc000339a4e, 0x0) /go/src/github.com/rancher/eks-operator/controller/eks-cluster-config-handler.go:105 +0x67 m/rancher/eks-operator/pkg/generated/controllers/eks.cattle.io/v1.FromEKSClusterConfigHandlerToHandler.func1(0xc000bbe3c0, 0x1a, 0x17c9250, 0xc00037a2c0, 0x104e, 0x403bcb, 0xc0003b4840, 0x413c5d) /go/src/github.com/rancher/eks-operator/pkg/generated/controllers/eks.cattle.io/v1/eksclusterconfig.go:105 +0x6b github.com/rancher/lasso/pkg/controller.SharedControllerHandlerFunc.OnChange(0xc00003a780, 0xc000bbe3c0, 0x1a, 0x17c9250, 0xc00037a2c0, 0x2, 0xc00036f468, 0x40409a, 0xc00036f440) /go/pkg/mod/github.com/rancher/lasso@v0.0.0-20200905045615-7fcb07d6a20b/pkg/controller/sharedcontroller.go:29 +0x4e github.com/rancher/lasso/pkg/controller.(sharedHandler).OnChange(0xc000409680, 0xc000bbe3c0, 0x1a, 0x17c9250, 0xc00037a2c0, 0xc000d7b201, 0x0) /go/pkg/mod/github.com/rancher/lasso@v0.0.0-20200905045615-7fcb07d6a20b/pkg/controller/sharedhandler.go:65 +0x115 github.com/rancher/lasso/pkg/controller.(controller).syncHandler(0xc0004380b0, 0xc000bbe3c0, 0x1a, 0xc000d7b318, 0x0) /go/pkg/mod/github.com/rancher/lasso@v0.0.0-20200905045615-7fcb07d6a20b/pkg/controller/controller.go:210 +0xd1 github.com/rancher/lasso/pkg/controller.(controller).processSingleItem(0xc0004380b0, 0x13da240, 0xc0003b4840, 0x0, 0x0) /go/pkg/mod/github.com/rancher/lasso@v0.0.0-20200905045615-7fcb07d6a20b/pkg/controller/controller.go:192 +0xe7 github.com/rancher/lasso/pkg/controller.(controller).processNextWorkItem(0xc0004380b0, 0x203001) /go/pkg/mod/github.com/rancher/lasso@v0.0.0-20200905045615-7fcb07d6a20b/pkg/controller/controller.go:169 +0x54 github.com/rancher/lasso/pkg/controller.(controller).runWorker(...) /go/pkg/mod/github.com/rancher/lasso@v0.0.0-20200905045615-7fcb07d6a20b/pkg/controller/controller.go:158 k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0xc000d4c0f0) /go/pkg/mod/k8s.io/apimachinery@v0.18.8/pkg/util/wait/wait.go:155 +0x5f k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc000d4c0f0, 0x17c2f00, 0xc0004a5bf0, 0x1, 0xc0004321e0) /go/pkg/mod/k8s.io/apimachinery@v0.18.8/pkg/util/wait/wait.go:156 +0x9b k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc000d4c0f0, 0x3b9aca00, 0x0, 0xc000b0e801, 0xc0004321e0) /go/pkg/mod/k8s.io/apimachinery@v0.18.8/pkg/util/wait/wait.go:133 +0x98 k8s.io/apimachinery/pkg/util/wait.Until(0xc000d4c0f0, 0x3b9aca00, 0xc0004321e0) /go/pkg/mod/k8s.io/apimachinery@v0.18.8/pkg/util/wait/wait.go:90 +0x4d created by github.com/rancher/lasso/pkg/controller.(controller).run /go/pkg/mod/github.com/rancher/lasso@v0.0.0-20200905045615-7fcb07d6a20b/pkg/controller/controller.go:129 +0x33b panic: runtime error: invalid memory address or nil pointer dereference [recovered] panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x1300478] goroutine 296 [running]: k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0) /go/pkg/mod/k8s.io/apimachinery@v0.18.8/pkg/util/runtime/runtime.go:55 +0x109 panic(0x142c380, 0x1ffc440) /usr/local/go/src/runtime/panic.go:965 +0x1b9 github.com/rancher/eks-operator/controller.createNodeGroup(0xc000c76580, 0xc000712a6b, 0xc0003b44a0, 0xc0003b4520, 0xc000712a60, 0xc0003b44e0, 0xc000c08180, 0xc0003b4460, 0xc000712a50, 0xc000712ad0, ...) /go/src/github.com/rancher/eks-operator/controller/nodegroup.go:267 +0x758 github.com/rancher/eks-operator/controller.(Handler).updateUpstreamClusterState(0xc000422340, 0xc0002ce000, 0xc000c76580, 0xc000b0aff0, 0x42, 0xc000521700, 0xc0000b2750, 0xc0000b2790, 0xc0000b27a8, 0x1, ...) /go/src/github.com/rancher/eks-operator/controller/eks-cluster-config-handler.go:1071 +0x28e8 github.com/rancher/eks-operator/controller.(Handler).checkAndUpdate(0xc000422340, 0xc00037a2c0, 0xc0000b2750, 0xc0000b2790, 0xc0000b27a8, 0xc0000b27a8, 0xc000712ba0, 0xc) /go/src/github.com/rancher/eks-operator/controller/eks-cluster-config-handler.go:312 +0xcfb github.com/rancher/eks-operator/controller.(Handler).OnEksConfigChanged(0xc000422340, 0xc000bbe3c0, 0x1a, 0xc00037a2c0, 0x0, 0x44842c, 0xc000432058) /go/src/github.com/rancher/eks-operator/controller/eks-cluster-config-handler.go:93 +0x211 github.com/rancher/eks-operator/controller.(Handler).recordError.func1(0xc000bbe3c0, 0x1a, 0xc00037a2c0, 0xc0003d4000, 0xc000339a4e, 0x0) /go/src/github.com/rancher/eks-operator/controller/eks-cluster-config-handler.go:105 +0x67 github.com/rancher/eks-operator/pkg/generated/controllers/eks.cattle.io/v1.FromEKSClusterConfigHandlerToHandler.func1(0xc000bbe3c0, 0x1a, 0x17c9250, 0xc00037a2c0, 0x104e, 0x403bcb, 0xc0003b4840, 0x413c5d) /go/src/github.com/rancher/eks-operator/pkg/generated/controllers/eks.cattle.io/v1/eksclusterconfig.go:105 +0x6b github.com/rancher/lasso/pkg/controller.SharedControllerHandlerFunc.OnChange(0xc00003a780, 0xc000bbe3c0, 0x1a, 0x17c9250, 0xc00037a2c0, 0x2, 0xc00036f468, 0x40409a, 0xc00036f440) /go/pkg/mod/github.com/rancher/lasso@v0.0.0-20200905045615-7fcb07d6a20b/pkg/controller/sharedcontroller.go:29 +0x4e github.com/rancher/lasso/pkg/controller.(sharedHandler).OnChange(0xc000409680, 0xc000bbe3c0, 0x1a, 0x17c9250, 0xc00037a2c0, 0xc000d7b201, 0x0) /go/pkg/mod/github.com/rancher/lasso@v0.0.0-20200905045615-7fcb07d6a20b/pkg/controller/sharedhandler.go:65 +0x115 github.com/rancher/lasso/pkg/controller.(controller).syncHandler(0xc0004380b0, 0xc000bbe3c0, 0x1a, 0xc000d7b318, 0x0) /go/pkg/mod/github.com/rancher/lasso@v0.0.0-20200905045615-7fcb07d6a20b/pkg/controller/controller.go:210 +0xd1 github.com/rancher/lasso/pkg/controller.(controller).processSingleItem(0xc0004380b0, 0x13da240, 0xc0003b4840, 0x0, 0x0) /go/pkg/mod/github.com/rancher/lasso@v0.0.0-20200905045615-7fcb07d6a20b/pkg/controller/controller.go:192 +0xe7 github.com/rancher/lasso/pkg/controller.(controller).processNextWorkItem(0xc0004380b0, 0x203001) /go/pkg/mod/github.com/rancher/lasso@v0.0.0-20200905045615-7fcb07d6a20b/pkg/controller/controller.go:169 +0x54 github.com/rancher/lasso/pkg/controller.(*controller).runWorker(...) /go/pkg/mod/github.com/rancher/lasso@v0.0.0-20200905045615-7fcb07d6a20b/pkg/controller/controller.go:158 k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0xc000d4c0f0)`

rawmind0 commented 3 years ago

Is the same eks config working fine from Rancher UI?? It seems an issue with the eks operator. The issue should be opened at https://github.com/rancher/rancher

wojtasool commented 3 years ago

From the rancher UI I can see that config has been created but launch_template field was empty, and eks-operator was failing. But I could pick launch_template from the UI and save it, and it then proceeded.

I don't know whether or not this is problem with provider or eks-operator.

rawmind0 commented 3 years ago

The dump seems coming from eks-operator code, so issue seems related to it. The issue should be opened at https://github.com/rancher/rancher

wojtasool commented 3 years ago

https://github.com/rancher/rancher/issues/33361

wojtasool commented 3 years ago

For some reason launch_template_id is null "launchTemplate": {

"id": null,
"name": "staging-node_group1",
"type": "/v3/schemas/launchTemplate",
wojtasool commented 3 years ago

So I guess by some reason it was passed with null value by provider

rawmind0 commented 3 years ago

The provides should pass what is configured. Launch_template values are passed through vars in your config. Are you passing nil id for any reason?? Have these vars getting proper values??Thats why i asked you if using same config from ui was working for you

wojtasool commented 3 years ago

Yes it is working fine from UI. How can I see what's been passed to the eks-operator during apply phase? I can only see on plan "known after apply" in the launch_template_id field

              + launch_template {
                  + id      = (known after apply)
                  + name    = "staging-node_group1"
                  + version = (known after apply)
                }
            }
        dynamic launch_template {
          for_each = (try(node_groups.value.longhorn_disk_size, 0) > 0) ? [1] : []
          content {
            id      = data.aws_launch_template.launch_template[node_groups.value.name].id #aws_launch_template.eks_launch_template[node_groups.value.name].id
            version = aws_launch_template.eks_launch_template[node_groups.value.name].latest_version
            name    = aws_launch_template.eks_launch_template[node_groups.value.name].name
          }
        }
wojtasool commented 3 years ago

Also what terraform state shows is that there is launch_template_id provided.

                    "desired_size": 4,
                    "disk_size": 250,
                    "ec2_ssh_key": "EKS-nodes-key20210309080032355000000001",
                    "gpu": false,
                    "image_id": "",
                    "instance_type": "c5.2xlarge",
                    "labels": null,
                    "launch_template": [
                      {
                        "id": "lt-0053cefbfa9673a99",
                        "name": "staging-node_group1",
                        "version": 1
                      }
                    ],
rawmind0 commented 3 years ago

It seems that something may not be working properly on your tf module. If id = (known after apply) then the id value is not being set by config, it's calculated during the execution. The documentation [eks_config_v2] (https://registry.terraform.io/providers/rancher/rancher2/latest/docs/resources/cluster#creating-eks-cluster-from-rancher-v2-using-eks_config_v2-and-launch-template-for-rancher-v256-or-above) example, was tested and should work fine. Have you tried to hardcode launch_template id value just to test if it works??

wojtasool commented 3 years ago

Will try it tomorrow at work. But I assume if id = (known after apply) because this launch template is created within same terraform hence not known yet, but it should be provided later when it is actually created before rancher2_cluster starts.

There is also explicit depends_on = [var.rancher_dns, var.rancher_route_record_depends, var.rancher2_token, aws_security_group.eks_sg_allowall, _aws_launch_template.eks_launch_template]

So it should be created after dependent resources are created.

wojtasool commented 3 years ago

The same thing happens when I pass id of already existing launch_template. Field is empty.

Plan shows id but resulting cluster in Rancher UI has this field empty

             + launch_template {
                  + id      = "lt-0053cefbfa9673a99"
                  + name    = "staging-node_group1"
                  + version = (known after apply)
                }
wojtasool commented 3 years ago

Any ideas why it might happen?

rawmind0 commented 3 years ago

I was able to reproduce the issue. The use launch_template is not working properly. The operator is not full filling node group image_id , instance_type, etc.. Working on a fix.

wojtasool commented 3 years ago

So is it problem with provider or operator itself?

wojtasool commented 3 years ago

Is there any progress?

wojtasool commented 3 years ago

Up

dcardellino commented 3 years ago

Having same issue... something new @rawmind0 ???

seatonpear commented 3 years ago

I am having this same issue. Any update ?

rawmind0 commented 3 years ago

The issue is related with the use launch_template. The eks-operator is not full filling node group, image_id, instance_type, etc.. Using the UI, this is done by an extra call to aws API that is getting and full filling the data before send it to the eks operator. This behaviour is complicated to be replicated here, due to the cluster resource doesn't have the aws credentials (referring cloud credential id). We are working to find a good solution.

In the meanwhile, as workaround, not using launch_template or manual full filling the launch template data within the resource (node group, image_id, instance_type, etc..), should work properly.

dcardellino commented 3 years ago

@rawmind0 Any update on this issue?

ahmedfourti commented 3 years ago

Same here, any update please ?

ahmedfourti commented 3 years ago

Anyone knows if this will be resolved soon please ?

rawmind0 commented 2 years ago

Submitted PR https://github.com/rancher/terraform-provider-rancher2/pull/819 to fix this issue.

dcardellino commented 2 years ago

Submitted PR #819 to fix this issue.

We built the provider with the merged fix locally, but had no success building eks clusters with Rancher. Currently I'm testing it with the provider 1.17.1 which was mentioned in https://github.com/rancher/terraform-provider-rancher2/issues/820.

Currently it look like it is working.

rawmind0 commented 2 years ago

@dcardellino , could you please be more explicit around your test done with the PR #819 applied?? What Rancher version are you using??

brendarearden commented 2 years ago

Available to test in v2.6.3-rc7

thedadams commented 2 years ago

Not ready to test yet.

rawmind0 commented 2 years ago

The fix is already included at tfp last release v1.22.0

ahmedfourti commented 2 years ago

Many thanks, it works !

dcardellino commented 2 years ago

Many thanks, it works !

Confirm this. Works for me too! Thanks for solving this problem

rawmind0 commented 2 years ago

@ahmedfourti and @dcardellino, thanks for the feedback!

Closing the issue. Please, reopen if required.

ahmedfourti commented 2 years ago

Solving this might create an new issue I think.

I created a new cluster with 1 node_group and 1 launch template. Then I added another node_group and launch template and applied.

I got this error.

│ Error: Provider produced inconsistent final plan
│ 
│ When expanding the plan for module.user_clusters.rancher2_cluster.this["cluster-tools"] to include new values learned so far during apply, provider "registry.terraform.io/rancher/rancher2" produced an invalid
│ new value for .eks_config_v2[0].node_groups[1].launch_template[0].id: was cty.StringVal(""), but now cty.StringVal("lt-0540bb32cdc432165").
│ 
│ This is a bug in the provider, which should be reported in the provider's own issue tracker.
╵
╷
│ Error: Provider produced inconsistent final plan
│ 
│ When expanding the plan for module.user_clusters.rancher2_cluster.this["cluster-tools"] to include new values learned so far during apply, provider "registry.terraform.io/rancher/rancher2" produced an invalid
│ new value for .eks_config_v2[0].node_groups[1].launch_template[0].version: was cty.NumberIntVal(0), but now cty.NumberIntVal(1).
│ 
│ This is a bug in the provider, which should be reported in the provider's own issue tracker.

After applying again it worked, this is because the launch template was created during this first apply. My terraform module creates one launch template for each node_group, here is my code :

locals {
  node_groups = flatten([
    for cluster in var.clusters : [
      for node_group in coalesce(cluster.node_groups, []) : {
        cluster_name = cluster.cluster_name
        name          = node_group.name
        instance_type = node_group.instance_type
        desired_size  = node_group.desired_size
        max_size      = node_group.max_size
        disk_size     = node_group.disk_size
        subnets       = node_group.subnets
        ec2_ssh_key   = node_group.ec2_ssh_key
        labels        = node_group.labels
        tags          = node_group.tags
        user_data     = node_group.user_data
      }
    ]
  ])
}

resource "aws_launch_template" "this" {
  for_each = {
    for node_group in local.node_groups : node_group.name => node_group
  }
  name_prefix   = "${each.value.name}-"
  instance_type = each.value.instance_type == "" ? "t3.medium" : each.value.instance_type
  key_name      = each.value.ec2_ssh_key

  monitoring {
    enabled = true
  }
  user_data              = base64encode(each.value.user_data)
  vpc_security_group_ids = [aws_security_group.this[each.value.cluster_name].id]

  block_device_mappings {
    device_name = "/dev/xvda"

    ebs {
      encrypted   = true
      volume_type = "gp2"
      volume_size = each.value.disk_size == "" ? 150 : each.value.disk_size
    }
  }

  tag_specifications {
    resource_type = "instance"
    tags          = merge(local.common_tags, { "Name" = each.value.name }, each.value.tags)
  }

  tags = merge(local.common_tags, { "Name" = each.value.name }, each.value.tags)
}

resource "rancher2_cluster" "this" {
  for_each = {
    for cluster in var.clusters : cluster.cluster_name => cluster
  }
  name                                    = each.value.cluster_name
  description                             = each.value.description
  enable_network_policy                   = each.value.enable_network_policy
  default_pod_security_policy_template_id = each.value.default_pod_security_policy_template_id
  enable_cluster_monitoring               = each.value.enable_cluster_monitoring

  dynamic "cluster_monitoring_input" {
    for_each = each.value.enable_cluster_monitoring ? [{}] : []
    content {
      answers = each.value.cluster_monitoring_input
    }
  }

  eks_config_v2 {
    name                = each.value.cluster_name
    cloud_credential_id = each.value.cloud_credential_id
    imported            = false
    kms_key             = each.value.kms_key == "" ? null : each.value.kms_key
    region              = each.value.region
    kubernetes_version  = each.value.eks_version
    logging_types       = each.value.logging_types
    subnets             = each.value.eks_subnets
    security_groups     = [aws_security_group.this[each.key].id]
    service_role        = aws_iam_role.this[each.key].name
    tags                = merge(local.common_tags, { "Name" = each.value.cluster_name })
    private_access      = each.value.private_access
    public_access       = each.value.public_access

    dynamic "node_groups" {
      for_each = each.value.node_groups != null ? each.value.node_groups : []
      content {
        name         = node_groups.value.name
        desired_size = node_groups.value.desired_size == "" ? 3 : node_groups.value.desired_size
        max_size     = node_groups.value.max_size == "" ? 3 : node_groups.value.max_size
        subnets      = node_groups.value.subnets
        labels       = node_groups.value.labels == "" ? null : node_groups.value.labels
        launch_template {
          id      = aws_launch_template.this[node_groups.value.name].id
          name    = aws_launch_template.this[node_groups.value.name].name
          version = aws_launch_template.this[node_groups.value.name].latest_version
        }
        resource_tags = merge(local.common_tags, { "Name" = each.value.cluster_name }, node_groups.value.tags)
      }
    }
  }
}

I can't reopen this issue @rawmind0 :/

rawmind0 commented 2 years ago

I see what you mean @ahmedfourti , but at some point it'd be expected. The plan is failing due to the launch template doesn't exist previously and it should. As you mentioned, the apply is working fine due to the launch template is created and the values can be used by the rancher2_cluster resource.

To avoid this issue at plan you may try to split your tf definition/state or do a 2 steps plan, with aws deployment by one side and Rancher deployment by other side.

ahmedfourti commented 2 years ago

It is not the plan that is failing @rawmind0 but the apply. The plan works fine by the way :


              + launch_template {
                  + id      = (known after apply)
                  + name    = (known after apply)
                  + version = (known after apply)
                }
rawmind0 commented 2 years ago

I meant that it's failing the plan apply not the plan calculation (and explained why). A direct terraform apply (without previous plan) is also failing??

ahmedfourti commented 2 years ago

Direct apply is also failing. My steps:

  1. terraform apply (everything looks fine)
  2. yes to confirm
  3. the launch template is created but when trying to create node_group, it fails with that error
  4. terraform apply again works since the lauch_template is created before
rawmind0 commented 2 years ago

The steps are not clear to me, @ahmedfourti . The error you attached is related to a terraform plan Error: Provider produced inconsistent final plan . What are you trying to apply, a previous calculated plan or a direct terraform apply??

ahmedfourti commented 2 years ago

Sorry for the misunderstanding, I am directly applying. Also I just noticed this : when adding new group to an existing cluster, the version of launch_template is set to 0, it should be set to (known after apply) and the id doesn't even appear.

              + launch_template {
                  + name    = (known after apply)
                  + version = 0
                }
a-blender commented 1 year ago

There have been several bugs with EKS launch templates and a lot of work has been done on the operator since the last comment on this thread, 1 year ago - Jan 3, 2022. The original bug where the eks operator is not completely fulfilling all the node group data appears to be fixed based on this comment so this issue can remain closed. @ahmedfourti I believe the error you mentioned here is already being tracked here #835 and is currently in progress. I am exploring a fix for the next Terraform provider release.