hashicorp / terraform-provider-aws

The AWS Provider enables Terraform to manage AWS resources.
https://registry.terraform.io/providers/hashicorp/aws
Mozilla Public License 2.0
9.78k stars 9.13k forks source link

aws_eks_addon_version: add platform_version to mitigate error during apply #24149

Closed awoimbee closed 1 month ago

awoimbee commented 2 years ago

Community Note

Description

Please add a platform_version argument to aws_eks_addon_version.

The aws_eks_addon_version resource doesn't support a platform_version argument, so I get this error:

│ Error: error updating EKS Add-On (xxxxxx:aws-ebs-csi-driver): InvalidParameterException: Addon version specified is not supported for the cluster platform version
│ {
│   RespMetadata: {
│     StatusCode: 400,
│     RequestID: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
│   },
│   AddonName: "aws-ebs-csi-driver",
│   ClusterName: "xxxxxx",
│   Message_: "Addon version specified is not supported for the cluster platform version"
│ }
│
│   with module.eks.aws_eks_addon.addons["aws-ebs-csi-driver"],
│   on ../../modules/aws_eks/cluster.tf line 76, in resource "aws_eks_addon" "addons":
│   76: resource "aws_eks_addon" "addons" {

See:

New or Affected Resource(s)

References

awoimbee commented 2 years ago

This is again an issue with the update of the ebs-csi addon "v1.8.0-eksbuild.0" -> "v1.9.0-eksbuild.1". When I check the required platform version using

aws eks describe-addon-versions --kubernetes-version=1.22 --addon-name=aws-ebs-csi-driver --query='addons[0].addonVersions[0].compatibilities'

I get clusterVersion: 1.22 platformVersion: eks.4+. My cluster is on 1.22 eks.2 (2 versions behind !?)

spr-mweber3 commented 2 years ago

Hunting us aswell.

Error: error updating EKS Add-On (eks1:aws-ebs-csi-driver): InvalidParameterException: Addon version specified is not supported for the cluster platform version

Aparently platform version eks.4 was released 21th July.

We're getting the latest version of the addon through

data "aws_eks_addon_version" "latest" {
  addon_name         = "aws-ebs-csi-driver"
  kubernetes_version = aws_eks_cluster.eks1.version
  most_recent        = true
}

like it is described here.

Due to the lack of an argument platform_version in the data source the latest version of an EKS-addon might be incompatible with an EKS cluster.

spr-mweber3 commented 2 years ago

I was able to circumvent the issue for the time being by utilizing an External Data Source. But this is ugly as hell. The provider needs to support selecting the latest version of an EKS-addon based on EKS version AND platform version.

bryantbiggs commented 2 years ago

I was able to circumvent the issue for the time being by utilizing an External Data Source. But this is ugly as hell. The provider needs to support selecting the latest version of an EKS-addon based on EKS version AND platform version.

The provider largely reflects the underlying AWS API https://docs.aws.amazon.com/cli/latest/reference/eks/describe-addon-versions.html

spr-mweber3 commented 2 years ago

Right. The API call returns a list of compatible EKS platform versions through addons[0].addonVersions[0].compatibilities. This is how I was able to make it work with an External Data Source. An argument platform_version next to kubernetes_version on aws_eks_addon_version would be neat.

mijndert commented 1 year ago

@spr-mweber3 can you share your external datasource snippet maybe?

bryantbiggs commented 1 year ago

Why is the platform version relevant? There is the addon version which incorporates the platform version but this is not a relevant detail to users. We use the data source in the EKS module users can either use the default behavior which uses the addons default version for the given Kubernetes version or they can opt in to use the addons latest version for the given Kubernetes version:

module "eks" {
  source  = "terraform-aws-modules/eks/aws"
  version = "~> 19.0"

  cluster_name    = "my-cluster"
  cluster_version = "1.27"

  cluster_addons = {
    coredns = {
      most_recent = true
    }
    kube-proxy = {
      # Will use addons default version
    }
    vpc-cni = {
      most_recent = true
    }
  }

  ... truncated for brevity
bryantbiggs commented 1 month ago

@awoimbee is this issue still relevant or can we close it out now?

awoimbee commented 1 month ago

This issue has not bitten me again, but fundamentally the bug is still there. When AWS releases an EKS addon that requires a certain platform_version that has not rolled out to everyone, some terraform stacks will randomly break.

Note that your example above is false ("There is the addon version which incorporates the platform version" is false). I personnaly deploy my addons like so:


data "aws_eks_addon_version" "v" {
  for_each           = local.eks_addons
  addon_name         = each.key
  kubernetes_version = aws_eks_cluster.default.version
  most_recent        = true
}

resource "aws_eks_addon" "addons" {
  for_each = local.eks_addons

  cluster_name  = aws_eks_cluster.default.name
  addon_name    = each.key
  addon_version = data.aws_eks_addon_version.v[each.key].version
  tags          = local.default_tags
  depends_on    = [aws_eks_cluster.default]

  resolve_conflicts_on_create = "OVERWRITE"
  resolve_conflicts_on_update = "OVERWRITE"

  service_account_role_arn = lookup(each.value, "role_arn", null)
  configuration_values     = lookup(each.value, "configuration", null)
}
bryantbiggs commented 1 month ago

Note that your example above is false ("There is the addon version which incorporates the platform version" is false).

What is false? We use the data source for the addon version similar to what you provide, but we mostly only concern ourselves with the addon default version or latest version for a given Kubernetes version. https://github.com/terraform-aws-modules/terraform-aws-eks/blob/c60b70fbc80606eb4ed8cf47063ac6ed0d8dd435/main.tf#L499-L557

This is recommended to ensure that when you upgrade, your addons are upgrade in lockstep and avoids situations where your addon is quite a bit behind because you forgot to update it.

This pattern is used in both the EKS module and the EKS blueprints addons so its used quite extensively without issue

bryantbiggs commented 1 month ago

Going back to your original issue though, I don't see this as a bug in the provider but an issue with the implementation. You cannot define an addon version that is not valid for an addon or region, whether thats due to delays in rollouts or other. The API is the source of truth, thats why we provide the DescribeAddonVersions API.

And further to clarify - there isn't a platform version like there is for the EKS control plane. There is a build version for addons which can be thought of as the equivalent of the platform version. But EKS does not expose this in the API as an argument so it stands to reason that Terraform shouldn't either - the data source should be used to return the list of versions that are supported for the given addon, Kubernetes version, and region.

awoimbee commented 1 month ago

Ok, seems like some internal organization stuff changed in how EKS addons are managed (from this comment). When I run aws eks describe-addon-versions --addon-name aws-ebs-csi-driver, platformVersions is now always ["*"].

So platformVersion is now unused, so this issue should not happen anymore.

Note that this "deprecation" was not announced and that platformVersion is still returned by the API.

PS: Bryant, you really did not help, did you read the original issue ?

github-actions[bot] commented 1 month ago

[!WARNING] This issue has been closed, meaning that any additional comments are hard for our team to see. Please assume that the maintainers will not see them.

Ongoing conversations amongst community members are welcome, however, the issue will be locked after 30 days. Moving conversations to another venue, such as the AWS Provider forum, is recommended. If you have additional concerns, please open a new issue, referencing this one where needed.

bryantbiggs commented 1 month ago

PS: Bryant, you really did not help, did you read the original issue ?

Yes I did - what was not helpful? Asking if this two year old issue was still relevant or pointing out how this situation has been solved today by reference implementations using the data source?

github-actions[bot] commented 2 days ago

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.