Closed wojtasool closed 2 years ago
Is the same eks config working fine from Rancher UI?? It seems an issue with the eks operator. The issue should be opened at https://github.com/rancher/rancher
From the rancher UI I can see that config has been created but launch_template field was empty, and eks-operator was failing. But I could pick launch_template from the UI and save it, and it then proceeded.
I don't know whether or not this is problem with provider or eks-operator.
The dump seems coming from eks-operator code, so issue seems related to it. The issue should be opened at https://github.com/rancher/rancher
For some reason launch_template_id is null "launchTemplate": {
"id": null,
"name": "staging-node_group1",
"type": "/v3/schemas/launchTemplate",
So I guess by some reason it was passed with null value by provider
The provides should pass what is configured. Launch_template values are passed through vars in your config. Are you passing nil id for any reason?? Have these vars getting proper values??Thats why i asked you if using same config from ui was working for you
Yes it is working fine from UI. How can I see what's been passed to the eks-operator during apply phase? I can only see on plan "known after apply" in the launch_template_id field
+ launch_template {
+ id = (known after apply)
+ name = "staging-node_group1"
+ version = (known after apply)
}
}
dynamic launch_template {
for_each = (try(node_groups.value.longhorn_disk_size, 0) > 0) ? [1] : []
content {
id = data.aws_launch_template.launch_template[node_groups.value.name].id #aws_launch_template.eks_launch_template[node_groups.value.name].id
version = aws_launch_template.eks_launch_template[node_groups.value.name].latest_version
name = aws_launch_template.eks_launch_template[node_groups.value.name].name
}
}
Also what terraform state shows is that there is launch_template_id provided.
"desired_size": 4,
"disk_size": 250,
"ec2_ssh_key": "EKS-nodes-key20210309080032355000000001",
"gpu": false,
"image_id": "",
"instance_type": "c5.2xlarge",
"labels": null,
"launch_template": [
{
"id": "lt-0053cefbfa9673a99",
"name": "staging-node_group1",
"version": 1
}
],
It seems that something may not be working properly on your tf module. If id = (known after apply)
then the id value is not being set by config, it's calculated during the execution. The documentation [eks_config_v2]
(https://registry.terraform.io/providers/rancher/rancher2/latest/docs/resources/cluster#creating-eks-cluster-from-rancher-v2-using-eks_config_v2-and-launch-template-for-rancher-v256-or-above) example, was tested and should work fine. Have you tried to hardcode launch_template id value just to test if it works??
Will try it tomorrow at work. But I assume if id = (known after apply) because this launch template is created within same terraform hence not known yet, but it should be provided later when it is actually created before rancher2_cluster starts.
There is also explicit depends_on = [var.rancher_dns, var.rancher_route_record_depends, var.rancher2_token, aws_security_group.eks_sg_allowall, _aws_launch_template.eks_launch_template]
So it should be created after dependent resources are created.
The same thing happens when I pass id of already existing launch_template. Field is empty.
Plan shows id but resulting cluster in Rancher UI has this field empty
+ launch_template {
+ id = "lt-0053cefbfa9673a99"
+ name = "staging-node_group1"
+ version = (known after apply)
}
Any ideas why it might happen?
I was able to reproduce the issue. The use launch_template
is not working properly. The operator is not full filling node group image_id
, instance_type
, etc.. Working on a fix.
So is it problem with provider or operator itself?
Is there any progress?
Up
Having same issue... something new @rawmind0 ???
I am having this same issue. Any update ?
The issue is related with the use launch_template
. The eks-operator
is not full filling node group
, image_id
, instance_type
, etc.. Using the UI, this is done by an extra call to aws API that is getting and full filling the data before send it to the eks operator. This behaviour is complicated to be replicated here, due to the cluster resource doesn't have the aws credentials (referring cloud credential id). We are working to find a good solution.
In the meanwhile, as workaround, not using launch_template
or manual full filling the launch template data within the resource (node group
, image_id
, instance_type
, etc..), should work properly.
@rawmind0 Any update on this issue?
Same here, any update please ?
Anyone knows if this will be resolved soon please ?
Submitted PR https://github.com/rancher/terraform-provider-rancher2/pull/819 to fix this issue.
Submitted PR #819 to fix this issue.
We built the provider with the merged fix locally, but had no success building eks clusters with Rancher. Currently I'm testing it with the provider 1.17.1 which was mentioned in https://github.com/rancher/terraform-provider-rancher2/issues/820.
Currently it look like it is working.
@dcardellino , could you please be more explicit around your test done with the PR #819 applied?? What Rancher version are you using??
Available to test in v2.6.3-rc7
Not ready to test yet.
The fix is already included at tfp last release v1.22.0
Many thanks, it works !
Many thanks, it works !
Confirm this. Works for me too! Thanks for solving this problem
@ahmedfourti and @dcardellino, thanks for the feedback!
Closing the issue. Please, reopen if required.
Solving this might create an new issue I think.
I created a new cluster with 1 node_group and 1 launch template. Then I added another node_group and launch template and applied.
I got this error.
│ Error: Provider produced inconsistent final plan
│
│ When expanding the plan for module.user_clusters.rancher2_cluster.this["cluster-tools"] to include new values learned so far during apply, provider "registry.terraform.io/rancher/rancher2" produced an invalid
│ new value for .eks_config_v2[0].node_groups[1].launch_template[0].id: was cty.StringVal(""), but now cty.StringVal("lt-0540bb32cdc432165").
│
│ This is a bug in the provider, which should be reported in the provider's own issue tracker.
╵
╷
│ Error: Provider produced inconsistent final plan
│
│ When expanding the plan for module.user_clusters.rancher2_cluster.this["cluster-tools"] to include new values learned so far during apply, provider "registry.terraform.io/rancher/rancher2" produced an invalid
│ new value for .eks_config_v2[0].node_groups[1].launch_template[0].version: was cty.NumberIntVal(0), but now cty.NumberIntVal(1).
│
│ This is a bug in the provider, which should be reported in the provider's own issue tracker.
After applying again it worked, this is because the launch template was created during this first apply. My terraform module creates one launch template for each node_group, here is my code :
locals {
node_groups = flatten([
for cluster in var.clusters : [
for node_group in coalesce(cluster.node_groups, []) : {
cluster_name = cluster.cluster_name
name = node_group.name
instance_type = node_group.instance_type
desired_size = node_group.desired_size
max_size = node_group.max_size
disk_size = node_group.disk_size
subnets = node_group.subnets
ec2_ssh_key = node_group.ec2_ssh_key
labels = node_group.labels
tags = node_group.tags
user_data = node_group.user_data
}
]
])
}
resource "aws_launch_template" "this" {
for_each = {
for node_group in local.node_groups : node_group.name => node_group
}
name_prefix = "${each.value.name}-"
instance_type = each.value.instance_type == "" ? "t3.medium" : each.value.instance_type
key_name = each.value.ec2_ssh_key
monitoring {
enabled = true
}
user_data = base64encode(each.value.user_data)
vpc_security_group_ids = [aws_security_group.this[each.value.cluster_name].id]
block_device_mappings {
device_name = "/dev/xvda"
ebs {
encrypted = true
volume_type = "gp2"
volume_size = each.value.disk_size == "" ? 150 : each.value.disk_size
}
}
tag_specifications {
resource_type = "instance"
tags = merge(local.common_tags, { "Name" = each.value.name }, each.value.tags)
}
tags = merge(local.common_tags, { "Name" = each.value.name }, each.value.tags)
}
resource "rancher2_cluster" "this" {
for_each = {
for cluster in var.clusters : cluster.cluster_name => cluster
}
name = each.value.cluster_name
description = each.value.description
enable_network_policy = each.value.enable_network_policy
default_pod_security_policy_template_id = each.value.default_pod_security_policy_template_id
enable_cluster_monitoring = each.value.enable_cluster_monitoring
dynamic "cluster_monitoring_input" {
for_each = each.value.enable_cluster_monitoring ? [{}] : []
content {
answers = each.value.cluster_monitoring_input
}
}
eks_config_v2 {
name = each.value.cluster_name
cloud_credential_id = each.value.cloud_credential_id
imported = false
kms_key = each.value.kms_key == "" ? null : each.value.kms_key
region = each.value.region
kubernetes_version = each.value.eks_version
logging_types = each.value.logging_types
subnets = each.value.eks_subnets
security_groups = [aws_security_group.this[each.key].id]
service_role = aws_iam_role.this[each.key].name
tags = merge(local.common_tags, { "Name" = each.value.cluster_name })
private_access = each.value.private_access
public_access = each.value.public_access
dynamic "node_groups" {
for_each = each.value.node_groups != null ? each.value.node_groups : []
content {
name = node_groups.value.name
desired_size = node_groups.value.desired_size == "" ? 3 : node_groups.value.desired_size
max_size = node_groups.value.max_size == "" ? 3 : node_groups.value.max_size
subnets = node_groups.value.subnets
labels = node_groups.value.labels == "" ? null : node_groups.value.labels
launch_template {
id = aws_launch_template.this[node_groups.value.name].id
name = aws_launch_template.this[node_groups.value.name].name
version = aws_launch_template.this[node_groups.value.name].latest_version
}
resource_tags = merge(local.common_tags, { "Name" = each.value.cluster_name }, node_groups.value.tags)
}
}
}
}
I can't reopen this issue @rawmind0 :/
I see what you mean @ahmedfourti , but at some point it'd be expected. The plan is failing due to the launch template doesn't exist previously and it should. As you mentioned, the apply is working fine due to the launch template is created and the values can be used by the rancher2_cluster
resource.
To avoid this issue at plan you may try to split your tf definition/state or do a 2 steps plan, with aws deployment by one side and Rancher deployment by other side.
It is not the plan that is failing @rawmind0 but the apply. The plan works fine by the way :
+ launch_template {
+ id = (known after apply)
+ name = (known after apply)
+ version = (known after apply)
}
I meant that it's failing the plan apply not the plan calculation (and explained why). A direct terraform apply (without previous plan) is also failing??
Direct apply is also failing. My steps:
The steps are not clear to me, @ahmedfourti . The error you attached is related to a terraform plan Error: Provider produced inconsistent final plan
. What are you trying to apply, a previous calculated plan or a direct terraform apply??
Sorry for the misunderstanding, I am directly applying. Also I just noticed this : when adding new group to an existing cluster, the version of launch_template is set to 0, it should be set to (known after apply) and the id doesn't even appear.
+ launch_template {
+ name = (known after apply)
+ version = 0
}
There have been several bugs with EKS launch templates and a lot of work has been done on the operator since the last comment on this thread, 1 year ago - Jan 3, 2022. The original bug where the eks operator is not completely fulfilling all the node group data appears to be fixed based on this comment so this issue can remain closed. @ahmedfourti I believe the error you mentioned here is already being tracked here #835 and is currently in progress. I am exploring a fix for the next Terraform provider release.
During
creation of new node_group with launch template, rancher does not see launch template in the UI and the eks-operator fails. When I edit cluster and manually point new node group to use specific launch_template then it works. Otherwise it is empty.I define environment using terraform provider 1.15.1
Plan also shows that launch_template will be created and used within eks node_group.
This is the code i use:
`resource "rancher2_cluster" "EKS-custom" { depends_on = [var.rancher_dns, var.rancher_route_record_depends, var.rancher2_token, aws_security_group.eks_sg_allowall, aws_launch_template.eks_launch_template] name = var.cluster-name description = "EKS Kubernetes cluster"
eks_config_v2 { cloud_credential_id = rancher2_cloud_credential.eks_credential.id kubernetes_version = var.kubernetes_version region = var.region
} lifecycle { ignore_changes = [ eks_config_v2[0].public_access_sources, ] } }
E0624 08:47:16.789014 1 runtime.go:78] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference) goroutine 296 [running]: k8s.io/apimachinery/pkg/util/runtime.logPanic(0x142c380, 0x1ffc440) /go/pkg/mod/k8s.io/apimachinery@v0.18.8/pkg/util/runtime/runtime.go:74 +0xa6 k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0) /go/pkg/mod/k8s.io/apimachinery@v0.18.8/pkg/util/runtime/runtime.go:48 +0x86 panic(0x142c380, 0x1ffc440) /usr/local/go/src/runtime/panic.go:965 +0x1b9 github.com/rancher/eks-operator/controller.createNodeGroup(0xc000c76580, 0xc000712a6b, 0xc0003b44a0, 0xc0003b4520, 0xc000712a60, 0xc0003b44e0, 0xc000c08180, 0xc0003b4460, 0xc000712a50, 0xc000712ad0, ...) /go/src/github.com/rancher/eks-operator/controller/nodegroup.go:267 +0x758 github.com/rancher/eks-operator/controller.(Handler).updateUpstreamClusterState(0xc000422340, 0xc0002ce000, 0xc000c76580, 0xc000b0aff0, 0x42, 0xc000521700, 0xc0000b2750, 0xc0000b2790, 0xc0000b27a8, 0x1, ...) /go/src/github.com/rancher/eks-operator/controller/eks-cluster-config-handler.go:1071 +0x28e8 github.com/rancher/eks-operator/controller.(Handler).checkAndUpdate(0xc000422340, 0xc00037a2c0, 0xc0000b2750, 0xc0000b2790, 0xc0000b27a8, 0xc0000b27a8, 0xc000712ba0, 0xc) /go/src/github.com/rancher/eks-operator/controller/eks-cluster-config-handler.go:312 +0xcfb github.com/rancher/eks-operator/controller.(Handler).OnEksConfigChanged(0xc000422340, 0xc000bbe3c0, 0x1a, 0xc00037a2c0, 0x0, 0x44842c, 0xc000432058) /go/src/github.com/rancher/eks-operator/controller/eks-cluster-config-handler.go:93 +0x211 github.com/rancher/eks-operator/controller.(Handler).recordError.func1(0xc000bbe3c0, 0x1a, 0xc00037a2c0, 0xc0003d4000, 0xc000339a4e, 0x0) /go/src/github.com/rancher/eks-operator/controller/eks-cluster-config-handler.go:105 +0x67 m/rancher/eks-operator/pkg/generated/controllers/eks.cattle.io/v1.FromEKSClusterConfigHandlerToHandler.func1(0xc000bbe3c0, 0x1a, 0x17c9250, 0xc00037a2c0, 0x104e, 0x403bcb, 0xc0003b4840, 0x413c5d) /go/src/github.com/rancher/eks-operator/pkg/generated/controllers/eks.cattle.io/v1/eksclusterconfig.go:105 +0x6b github.com/rancher/lasso/pkg/controller.SharedControllerHandlerFunc.OnChange(0xc00003a780, 0xc000bbe3c0, 0x1a, 0x17c9250, 0xc00037a2c0, 0x2, 0xc00036f468, 0x40409a, 0xc00036f440) /go/pkg/mod/github.com/rancher/lasso@v0.0.0-20200905045615-7fcb07d6a20b/pkg/controller/sharedcontroller.go:29 +0x4e github.com/rancher/lasso/pkg/controller.(sharedHandler).OnChange(0xc000409680, 0xc000bbe3c0, 0x1a, 0x17c9250, 0xc00037a2c0, 0xc000d7b201, 0x0) /go/pkg/mod/github.com/rancher/lasso@v0.0.0-20200905045615-7fcb07d6a20b/pkg/controller/sharedhandler.go:65 +0x115 github.com/rancher/lasso/pkg/controller.(controller).syncHandler(0xc0004380b0, 0xc000bbe3c0, 0x1a, 0xc000d7b318, 0x0) /go/pkg/mod/github.com/rancher/lasso@v0.0.0-20200905045615-7fcb07d6a20b/pkg/controller/controller.go:210 +0xd1 github.com/rancher/lasso/pkg/controller.(controller).processSingleItem(0xc0004380b0, 0x13da240, 0xc0003b4840, 0x0, 0x0) /go/pkg/mod/github.com/rancher/lasso@v0.0.0-20200905045615-7fcb07d6a20b/pkg/controller/controller.go:192 +0xe7 github.com/rancher/lasso/pkg/controller.(controller).processNextWorkItem(0xc0004380b0, 0x203001) /go/pkg/mod/github.com/rancher/lasso@v0.0.0-20200905045615-7fcb07d6a20b/pkg/controller/controller.go:169 +0x54 github.com/rancher/lasso/pkg/controller.(controller).runWorker(...) /go/pkg/mod/github.com/rancher/lasso@v0.0.0-20200905045615-7fcb07d6a20b/pkg/controller/controller.go:158 k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0xc000d4c0f0) /go/pkg/mod/k8s.io/apimachinery@v0.18.8/pkg/util/wait/wait.go:155 +0x5f k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc000d4c0f0, 0x17c2f00, 0xc0004a5bf0, 0x1, 0xc0004321e0) /go/pkg/mod/k8s.io/apimachinery@v0.18.8/pkg/util/wait/wait.go:156 +0x9b k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc000d4c0f0, 0x3b9aca00, 0x0, 0xc000b0e801, 0xc0004321e0) /go/pkg/mod/k8s.io/apimachinery@v0.18.8/pkg/util/wait/wait.go:133 +0x98 k8s.io/apimachinery/pkg/util/wait.Until(0xc000d4c0f0, 0x3b9aca00, 0xc0004321e0) /go/pkg/mod/k8s.io/apimachinery@v0.18.8/pkg/util/wait/wait.go:90 +0x4d created by github.com/rancher/lasso/pkg/controller.(controller).run /go/pkg/mod/github.com/rancher/lasso@v0.0.0-20200905045615-7fcb07d6a20b/pkg/controller/controller.go:129 +0x33b panic: runtime error: invalid memory address or nil pointer dereference [recovered] panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x1300478] goroutine 296 [running]: k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0) /go/pkg/mod/k8s.io/apimachinery@v0.18.8/pkg/util/runtime/runtime.go:55 +0x109 panic(0x142c380, 0x1ffc440) /usr/local/go/src/runtime/panic.go:965 +0x1b9 github.com/rancher/eks-operator/controller.createNodeGroup(0xc000c76580, 0xc000712a6b, 0xc0003b44a0, 0xc0003b4520, 0xc000712a60, 0xc0003b44e0, 0xc000c08180, 0xc0003b4460, 0xc000712a50, 0xc000712ad0, ...) /go/src/github.com/rancher/eks-operator/controller/nodegroup.go:267 +0x758 github.com/rancher/eks-operator/controller.(Handler).updateUpstreamClusterState(0xc000422340, 0xc0002ce000, 0xc000c76580, 0xc000b0aff0, 0x42, 0xc000521700, 0xc0000b2750, 0xc0000b2790, 0xc0000b27a8, 0x1, ...) /go/src/github.com/rancher/eks-operator/controller/eks-cluster-config-handler.go:1071 +0x28e8 github.com/rancher/eks-operator/controller.(Handler).checkAndUpdate(0xc000422340, 0xc00037a2c0, 0xc0000b2750, 0xc0000b2790, 0xc0000b27a8, 0xc0000b27a8, 0xc000712ba0, 0xc) /go/src/github.com/rancher/eks-operator/controller/eks-cluster-config-handler.go:312 +0xcfb github.com/rancher/eks-operator/controller.(Handler).OnEksConfigChanged(0xc000422340, 0xc000bbe3c0, 0x1a, 0xc00037a2c0, 0x0, 0x44842c, 0xc000432058) /go/src/github.com/rancher/eks-operator/controller/eks-cluster-config-handler.go:93 +0x211 github.com/rancher/eks-operator/controller.(Handler).recordError.func1(0xc000bbe3c0, 0x1a, 0xc00037a2c0, 0xc0003d4000, 0xc000339a4e, 0x0) /go/src/github.com/rancher/eks-operator/controller/eks-cluster-config-handler.go:105 +0x67 github.com/rancher/eks-operator/pkg/generated/controllers/eks.cattle.io/v1.FromEKSClusterConfigHandlerToHandler.func1(0xc000bbe3c0, 0x1a, 0x17c9250, 0xc00037a2c0, 0x104e, 0x403bcb, 0xc0003b4840, 0x413c5d) /go/src/github.com/rancher/eks-operator/pkg/generated/controllers/eks.cattle.io/v1/eksclusterconfig.go:105 +0x6b github.com/rancher/lasso/pkg/controller.SharedControllerHandlerFunc.OnChange(0xc00003a780, 0xc000bbe3c0, 0x1a, 0x17c9250, 0xc00037a2c0, 0x2, 0xc00036f468, 0x40409a, 0xc00036f440) /go/pkg/mod/github.com/rancher/lasso@v0.0.0-20200905045615-7fcb07d6a20b/pkg/controller/sharedcontroller.go:29 +0x4e github.com/rancher/lasso/pkg/controller.(sharedHandler).OnChange(0xc000409680, 0xc000bbe3c0, 0x1a, 0x17c9250, 0xc00037a2c0, 0xc000d7b201, 0x0) /go/pkg/mod/github.com/rancher/lasso@v0.0.0-20200905045615-7fcb07d6a20b/pkg/controller/sharedhandler.go:65 +0x115 github.com/rancher/lasso/pkg/controller.(controller).syncHandler(0xc0004380b0, 0xc000bbe3c0, 0x1a, 0xc000d7b318, 0x0) /go/pkg/mod/github.com/rancher/lasso@v0.0.0-20200905045615-7fcb07d6a20b/pkg/controller/controller.go:210 +0xd1 github.com/rancher/lasso/pkg/controller.(controller).processSingleItem(0xc0004380b0, 0x13da240, 0xc0003b4840, 0x0, 0x0) /go/pkg/mod/github.com/rancher/lasso@v0.0.0-20200905045615-7fcb07d6a20b/pkg/controller/controller.go:192 +0xe7 github.com/rancher/lasso/pkg/controller.(controller).processNextWorkItem(0xc0004380b0, 0x203001) /go/pkg/mod/github.com/rancher/lasso@v0.0.0-20200905045615-7fcb07d6a20b/pkg/controller/controller.go:169 +0x54 github.com/rancher/lasso/pkg/controller.(*controller).runWorker(...) /go/pkg/mod/github.com/rancher/lasso@v0.0.0-20200905045615-7fcb07d6a20b/pkg/controller/controller.go:158 k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0xc000d4c0f0)`