hashicorp / terraform-provider-aws

The AWS Provider enables Terraform to manage AWS resources.
https://registry.terraform.io/providers/hashicorp/aws
Mozilla Public License 2.0
9.87k stars 9.21k forks source link

eks_node_group unable to update version of workers #12675

Closed llamahunter closed 4 years ago

llamahunter commented 4 years ago

Community Note

Terraform Version

0.11.14

Affected Resource(s)

Terraform Configuration Files


resource "aws_eks_node_group" "worker" {
  cluster_name    = "${var.datacenter}"
  node_group_name = "${var.worker_group_name}"
  node_role_arn   = "${data.aws_iam_role.worker.arn}"
  subnet_ids      = [ "${var.subnet_ids}" ]

  scaling_config {
    desired_size = "${var.worker_desired_count}"
    max_size     = "${var.worker_max_count}"
    min_size     = "${var.worker_min_count}"
  }

  instance_types = [ "${var.worker_instance_type}" ]
  disk_size = "${var.worker_volume_size}"
  version = "${var.eks_version}"

  remote_access {
    ec2_ssh_key = "${var.ssh_key_name}"
  }
}

Debug Output

Error: Error applying plan:

1 error occurred:
    * module.eks-worker.aws_eks_node_group.worker: 1 error occurred:
2020-04-03T23:59:02.142-0700 [DEBUG] plugin.terraform-provider-kubernetes_v1.11.1_x4: 2020/04/03 23:59:02 [ERR] plugin: plugin server: accept unix /var/folders/cc/7_jb5tld1kvd5jmv8r0dxp0c0000gn/T/plugin583841734: use of closed network connection
2020-04-03T23:59:02.142-0700 [DEBUG] plugin.terraform-provider-aws_v2.56.0_x4: 2020/04/03 23:59:02 [ERR] plugin: plugin server: accept unix /var/folders/cc/7_jb5tld1kvd5jmv8r0dxp0c0000gn/T/plugin491393577: use of closed network connection
    * aws_eks_node_group.worker: error updating EKS Node Group (tek2:general) version: InvalidParameterException: Requested Nodegroup release version 1.14.7-20190927 is invalid. Allowed release version is 1.15.10-20200228
{
  ClusterName: "tek2",
  Message_: "Requested Nodegroup release version 1.14.7-20190927 is invalid. Allowed release version is 1.15.10-20200228",
  NodegroupName: "general"
}

Expected Behavior

Terraform should have performed a rolling update of the worker nodes to the new matching AMI for 1.15 following pod disruption budgets

Actual Behavior

Terraform failed to update the nodes because the old 1.14.7-20190927 AMI release_version attribute got auto added to the terraform state by the provider when the cluster was deployed with k8s 1.14

Steps to Reproduce

  1. Create an eks cluster with 1.14
  2. Create matching 1.14 managed workers
  3. Update control plane version to 1.15
  4. Update worker version to 1.15

Important Factoids

Cluster was previously deployed using managed workers at 1.14 without setting a 'version' or 'release_verison' attribute.

References

bflad commented 4 years ago

Hi folks 👋 Is this issue still reproducible? I just tried replicating it today between 1.15 and 1.16 where Terraform submitted the following request, which oddly enough upgraded just fine:

2020/05/19 15:40:44 [DEBUG] [aws-sdk-go] DEBUG: Request eks/UpdateNodegroupVersion Details:
---[ REQUEST POST-SIGN ]-----------------------------
POST /clusters/tf-acc-test-3848944511257360101/node-groups/tf-acc-test-3848944511257360101/update-version HTTP/1.1
Host: eks.us-west-2.amazonaws.com
User-Agent: aws-sdk-go/1.31.0 (go1.14.2; darwin; amd64) APN/1.0 HashiCorp/1.0 Terraform/0.12.7-sdk (+https://www.terraform.io)
Content-Length: 128
Authorization: AWS4-HMAC-SHA256 Credential=--OMITTED--/20200519/us-west-2/eks/aws4_request, SignedHeaders=content-length;content-type;host;x-amz-date, Signature=3200ded1f843ffd69c4977c8a4e2687d805e3e6d2eba0896fa6688bf137fe838
Content-Type: application/json
X-Amz-Date: 20200519T194044Z
Accept-Encoding: gzip

{"clientRequestToken":"terraform-20200519194044260400000007","force":false,"releaseVersion":"1.15.11-20200507","version":"1.16"}
-----------------------------------------------------
2020/05/19 15:40:44 [DEBUG] [aws-sdk-go] DEBUG: Response eks/UpdateNodegroupVersion Details:
---[ RESPONSE ]--------------------------------------
HTTP/1.1 200 OK
Connection: close
Content-Length: 237
Content-Type: application/json
Date: Tue, 19 May 2020 19:40:44 GMT
X-Amz-Apigw-Id: My1peF08PHcFTpw=
X-Amzn-Requestid: e9d91eea-3b1a-41b5-9717-8293c63ad790
X-Amzn-Trace-Id: Root=1-5ec4363c-3ea74b1a22329f8ee8800522

-----------------------------------------------------
2020/05/19 15:40:44 [DEBUG] [aws-sdk-go] {"update":{"id":"79f19070-4c92-3f02-a838-692d739dbc28","status":"InProgress","type":"VersionUpdate","params":[{"type":"Version","value":"1.16"},{"type":"ReleaseVersion","value":"1.16.8-20200507"}],"createdAt":1589917244.665,"errors":[]}}

Note that the EKS API automatically fixed the ReleaseVersion to be compatible in its response and subsequent updating of the EKS Node Group. I'm still going to submit the change to only include ReleaseVersion if it has a configuration change just to prevent any odd behaviors in the future, but it may potentially be working already.

bflad commented 4 years ago

As mentioned above, the EKS API may be allowing the previously incorrect behavior of the resource, but we have now also merged the fix to only submit the ReleaseVersion during the UpdateNodegroupVersion API call when its value has changed. This fix will release in version 2.63.0 of the Terraform AWS Provider, likely tomorrow. 👍

ghost commented 4 years ago

This has been released in version 2.63.0 of the Terraform AWS provider. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading.

For further feature requests or bug reports with this functionality, please create a new GitHub issue following the template for triage. Thanks!

Nuru commented 4 years ago

~I am still having this problem in AWS provider 2.64.0~

I thought I had this issue in 2.64.0 but it turned out I had 2.60.0 cached and that was where I experienced the issue. I was able to upgrade fine with 2.64.0

rohitgabriel commented 4 years ago

AWS provider 2.64.0 also has the same issue.

ghost commented 4 years ago

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. Thanks!