Closed PrateekKhatri closed 5 months ago
Hey @PrateekKhatri 👋 Thank you for taking the time to raise this. So that we have all of necessary information to answer the questions raised, can you update the issue description to include the information requested in the bug report template?
Hi,
Thanks for your reply.
I have added terraform provider and terraform version. Let me know if you need more information.
Hey @PrateekKhatri 👋 Thank you for the additional updates. As a note, it may be a bit difficult for us to provide a full investigation into what's going on here, as the Terraform configuration supplied does not include the configurations for the modules that you're calling, and without debug logs, we're not able to see exactly what was occurring at the time. With that said, I'll do my best to answer the questions you posed with the information we have available.
The first three questions with regards to the amount of time the apply took are quite difficult to answer without debug logs, as we're not able to see the timestamps of when each step of the apply occurred. That said, something that may impact the amount of time taken comes down to Terraform needing to wait for the resources to be fully created and report back their status so that the information around the resource may be saved to the state file. In the AWS console, on the other hand, these operations can happen in the background while you move on to other tasks. I'm not certain that this is what's happening in this case, but it very well may be. As far as reducing this time, that's again hard to say without further details around the configuration itself and the debug logs.
Further, since we were trying to create nodegroup with the same name as that of an existing nodegroup (created manually), the terraform apply failed after 90 minutes with the "error : resource alreday exist". Here we need to understand, why terraform plan did not warn about this issue or why did this error not come up at the start of the deployment.
This is a result of how Terraform behaves in general. Terraform does not automatically attempt to read all current resources within AWS to determine whether or not a given resource already exists before attempting to create it. Instead, it reads the configuration from the configuration files, then reads the state file in order to determine what should be created. If there are resources that are defined within the Terraform configuration that already exist in reality, but are not in the state file already, those resources should be imported into the state file prior to running a terraform apply
in order to prevent errors such as this. As far as why it took 90 minutes before this was reported, that's another thing that's unfortunately hard to determine without the full configuration and debug logs.
Is there any recommended tool, through which we can achieve this before terraform apply.
I'm not sure I fully understand this part of your questions. Do you mean to ask if there's a tool to determine whether resources exist in reality that are not currently in the state, but are defined in the configuration?
Also, we observed that if we try to update Nodegroup role (add/remove permission policies), terraform tries to re-deploy entire cluster nodegroup instead of just updating IAM role. Is this the expected behavior from terraform?? Because we have the functionality to update IAM Role policies from AWS Console.
I believe you're referencing a change to the node_role_arn
, correct? That argument is set to ForceNew
in the provider. On a brief glance, this appears to be due to the underlying function in the AWS Go SDK (eks.UpdateNodegroupConfig) not accepting a NodeRole
argument in its input. Because there's not a function in the underlying SDK to allow for updating the node_role_arn
on an existing resource, Terraform must replace the resource.
That said, I would expect that if you were attempting to simply add/remove policy permissions to the role, that the ARN of the role would not change. Knowing how that role ARN is being passed to the aws_eks_node_group
resource may help to provide a configuration change that would prevent this from occurring, if you happen to be able to provide that.
If you have any additional questions or need further clarification, please do let me know and I'll do my best to continue to help.
Hi,
Thanks for your reply.
I'm not sure I fully understand this part of your questions. Do you mean to ask if there's a tool to determine whether resources exist in reality that are not currently in the state, but are defined in the configuration?
Yes we need to check if there is a tool to determine whether resources exist in reality that are not currently in the state, but are defined in the configuration.
That said, I would expect that if you were attempting to simply add/remove policy permissions to the role, that the ARN of the role would not change. Knowing how that role ARN is being passed to the aws_eks_node_group resource may help to provide a configuration change that would prevent this from occurring, if you happen to be able to provide that.
We are only updating policy to the existing role attached to the EKS Node. Role ARN example:
resource "aws_iam_role" "node_group" {
name = "${var.env}_node_group_role"
assume_role_policy = jsonencode({
Statement = [{
Action = "sts:AssumeRole"
Effect = "Allow"
Principal = {
Service = "ec2.amazonaws.com"
}
}]
Version = "2012-10-17"
})
}
resource "aws_iam_role_policy_attachment" "AmazonSSMPolicy" {
policy_arn = "arn:aws:iam::aws:policy/service-role/AmazonEC2RoleforSSM"
role = aws_iam_role.node_group.name
}
resource "aws_eks_node_group" "eks_cluster" {
count = length(var.launch_template_ids)
cluster_name = aws_eks_cluster.cluster.name
node_group_name = "${var.env}-${var.eks_nodegroup[count.index]}-node-group"
node_role_arn = aws_iam_role.node_group.arn
subnet_ids = var.private_subnets
scaling_config {
desired_size = var.eks_desired_node_size[count.index]
max_size = var.eks_max_node_size[count.index]
min_size = var.eks_min_node_size[count.index]
}
# Custom launch template.
launch_template {
id = var.launch_template_ids[count.index]
version = var.launch_template_versions[count.index]
}
tags = {
Name = "${var.env}-${var.eks_nodegroup[count.index]}-node-group"
Environment = "${var.env}"
Department = "${var.env}"
}
}
Hey @PrateekKhatri 👋 I'm not personally aware of any tooling that attempts to read the Terraform configuration and state, and then attempts to determine if any of the defined resources exist in reality prior to running a terraform apply
.
As far as the node groups being replaced when updating the policy, I didn't see anything in the provided configuration that would trigger this kind of redeployment. That said, if you look at the plan log, whatever argument is triggering the recreation will have a note next to it that says # forces replacement
. Can you look for that note in the plan and let me know what argument(s) are triggering the replacement?
Edit: After posting this comment, I happened across terraformer, which may help with the task of creating Terraform configurations based on existing infrastructure. I have not personally used this, but felt it was worth bringing it to your attention so you could evaluate it.
Marking this issue as stale due to inactivity. This helps our maintainers find and focus on the active issues. If this issue receives no comments in the next 30 days it will automatically be closed. Maintainers can also remove the stale label.
If this issue was automatically closed and you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. Thank you!
I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.
Community Note
Hi Team,
Terraform CLI and Terraform AWS Provider Version
terraform provider : 3.63 terraform version: 0.15.3
Affected Resource(s)
We have setup our AWS infrastructure with terraform. Please find the terraform infrastructure module for your reference.
Terraform Configuration Files
First, we have deployed infrastructure with below parameters
In the meanwhile we manually added one nodegroup(ex: demo-system-ng).
After requirements changes, we have updated the parameters as below:
Also, incidently the new nodegroup we were trying to add (demo-system-ng) has the same name as that of manually deployed nodegroup.
Below are the queries we have and issues we faced:
After re-deploying infrastructure, it took around 90 minutes to deploy the infrastructure. As you can see we were trying to update nodegroup configuration, still it took more than 90 minutes.
We need to understand why it took so much time to deploy the infrastructure whereas when we do the same change through the AWS Console, it hardly takes around 20-30 minutes.
Also is there any workaround through which we can reduce time, since our production environment may have clusters with node count close to 100.
Further, since we were trying to create nodegroup with the same name as that of an existing nodegroup (created manually), the terraform apply failed after 90 minutes with the "error : resource alreday exist".
Here we need to understand, why terraform plan did not warn about this issue or why did this error not come up at the start of the deployment.
Is there any recommended tool, through which we can achieve this before terraform apply.
Also, we observed that if we try to update Nodegroup role (add/remove permission policies), terraform tries to re-deploy entire cluster nodegroup instead of just updating IAM role.
Is this the expected behavior from terraform?? Because we have the functionality to update IAM Role policies from AWS Console.