hashicorp / terraform-provider-aws

The AWS Provider enables Terraform to manage AWS resources.
https://registry.terraform.io/providers/hashicorp/aws
Mozilla Public License 2.0
9.75k stars 9.11k forks source link

Node Group cannot be created due to regex rule (cluster_name variable validation) #25038

Open kamilporwitaccenture opened 2 years ago

kamilporwitaccenture commented 2 years ago

Community Note

Terraform CLI and Terraform AWS Provider Version

Terraform version: 1.2.1 AWS Provider Version : ~> 4.15.1"

Affected Resource(s)

Terraform Configuration Files

Cluster (IoCluster) and all resources required for node group creation (VPC, subnets, role, security_group) already exists and all needed arns/ids are provided either through data resources or as fixed arns/ids.

node_group.tf

resource "aws_eks_node_group" "eks_managed_node_group_worker" {
  cluster_name    = data.aws_eks_cluster.io-ocean-cluster.id
  node_group_name = "testnodegroup"
  node_role_arn   = var.worker_node_role_arn
  subnet_ids      = data.aws_subnets.default_subnets.ids

  scaling_config {
    min_size     = var.eks_worker_node_group_min_size
    max_size     = var.eks_worker_node_group_max_size
    desired_size = var.eks_worker_node_group_desired_size
  }

  remote_access {
    source_security_group_ids = [var.worker_node_security_group_id]
  }

  ami_type       = var.eks_worker_node_group_ami_type
  disk_size      = var.eks_worker_node_group_disk_size
  instance_types = var.eks_worker_node_group_instances
  capacity_type  = "ON_DEMAND"

  labels = var.eks_worker_node_group_labels
  tags   = var.eks_worker_node_group_tags
  taint {
    key    = lookup(var.eks_worker_node_group_taints[0], "key", "nodetaint/notready")
    value  = lookup(var.eks_worker_node_group_taints[0], "value", "")
    effect = lookup(var.eks_worker_node_group_taints[0], "effect", "NO_SCHEDULE")
  }
}

data.tf

data "aws_vpcs" "main_vpc" {
  tags = {
    Name = var.main_vpc_name
  }
}

data "aws_subnets" "default_subnets" {
  filter {
    name   = "tag:Name"
    values = var.subnet_names
  }
}

data "aws_subnets" "eks_pod_subnets" {
  filter {
    name   = "tag:Name"
    values = var.eks_pods_subnet_names
  }
}

data "aws_eks_cluster" "io-ocean-cluster" {
    name = var.eks_cluster_name
}

variables.tf

variable "eks_cluster_name" {
  description = "Name of the EKS cluster"
  default = "IoCluster"
}
variable "subnet_names" {
  type        = list(string)
  description = "List of subnet names"
  default = ["SUB-01-AZ01", "SUB-02-AZ02"]
}
variable "main_vpc_name" {
  description = "Main VPC name where the EKS cluster will be build"
  default = "VPC-01"
}

#----------------------------------------
#       WORKER GROUP CONFIGURATION
#----------------------------------------
variable "eks_worker_node_group_instances" {
  type = list(string)
  default = ["t3.medium"]
}
variable "eks_worker_node_group_disk_size" {
  description = "Disk size of the worker nodes"
  default = 60
}
variable "eks_worker_node_group_ami_type" {
  description = "AMI type of worker nodes"
  default = "AL2_x86_64"
}
variable "eks_worker_node_group_min_size" {
  description = "Minimal number of worker nodes"
  default = 0
}
variable "eks_worker_node_group_max_size" {
  description = "Maximal number of worker nodes"
  default = 10
}
variable "eks_worker_node_group_desired_size" {
  description = "Desired number of worker nodes"
  default = 1
}
variable "eks_worker_node_group_labels" {
  type        = map(string)
  description = "A map of labels to add to worker nodes"
  default = { "APP" = "App", "k8s.amazonaws.com/eniConfig" = "" }
}
variable "eks_worker_node_group_taints" {
  type        = list(map(string))
  description = "A list of maps which contains taints for worker nodes"
  default = [ { "key" = "nodetaint/notready", "value" = "", "effect" = "NO_SCHEDULE" } ]
}
variable "eks_worker_node_group_tags" {
  type        = map(string)
  description = "A map of tags to add to all resources"
  default = { "App" = "App", "NodeGroup" = "Worker", "Terraform" = "True" }
}
variable "worker_node_security_group_id" {
  default = "sg-**********************"  #<-Existing resource
}
variable "worker_node_role_arn" {
  default = "arn:aws:iam::***********:role/EKS-CLUSTER-WORKER-NODES-ROLE"  #<-Existing resource
}

Debug Output

https://gist.github.com/kamilporwitaccenture/ebf73fc90a31b413ac8067657e63cadc

Expected Behavior

Node group should be created and attached to the cluster provided in the data resources.

Actual Behavior

Error appears:

Error: error creating EKS Node Group (IoCluster:testnodegroup): InvalidParameterException: Invalid value:  : field must consist of alphanumeric characters, '-', '_' or '.', and must start and end with an alphanumeric character (e.g. 'MyName',  or 'my.name',  or '123-abc', regex used for validation is '([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9]')
ā”‚ {
ā”‚   RespMetadata: {
ā”‚     StatusCode: 400,
ā”‚     RequestID: "5a166bc1-576e-4729-a9fa-59c56659691d"
ā”‚   },
ā”‚   ClusterName: "IoCluster",
ā”‚   Message_: "Invalid value:  : field must consist of alphanumeric characters, '-', '_' or '.', and must start and end with an alphanumeric character (e.g. 'MyName',  or 'my.name',  or '123-abc', regex used for validation is '([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9]')",
ā”‚   NodegroupName: "testnodegroup"
ā”‚ }
ā”‚
ā”‚   with aws_eks_node_group.eks_managed_node_group_worker,
ā”‚   on node_groups.tf line 1, in resource "aws_eks_node_group" "eks_managed_node_group_worker":
ā”‚    1: resource "aws_eks_node_group" "eks_managed_node_group_worker" {
ā”‚
ā•µ

which (we think) points out that cluster_name (IoCluster) is not valid as it does not lies within given regex (which is not true). If it points out to other variables it would be good that the error would point out exact variable which is missing.

Steps to Reproduce

  1. EKS cluster creation.
  2. IAM role and SG for node groups creation.
  3. Setting up variables to point out the existing resources.
  4. terraform apply --auto-approve.
justinretzolk commented 2 years ago

Hey @kamilporwitaccenture šŸ‘‹ It looks the error you're seeing is coming from the AWS API response, rather than the regex validation that Terraform does. I did notice something a bit odd in the error:

"Invalid value:  : field must consist of alphanumeric characters

Specifically, I would expect there to be a value after value: and before : field. So that we can get a better look, is it possible to provide debug logs as well (redacted as needed)?

I'd be remiss if I didn't mention that the id attribute of data.aws_eks_cluster will be the same as the name, so you could switch to using data.aws_eks_cluster.io-ocean-cluster.name, or even passing var.eks_cluster_name directly (since that's what you're interpolating for the data source). I presume you may have a reason for wanting to use the id, but figured I'd mention it, in case it helps at all!

kamilporwitaccenture commented 2 years ago

Hi @justinretzolk yeah, forgot to add debug (I will edit the issue and at it there as well): https://gist.github.com/kamilporwitaccenture/ebf73fc90a31b413ac8067657e63cadc

At first we tried with var.eks_cluster_name but that did not work and returned the same error as above that's why we switched to data (the name should not be malformed there).

kamilporwitaccenture commented 2 years ago

Hi @justinretzolk Any update on this issue?

One more thing. In the debug I also found such response from API:

2022-05-25T12:01:23.041Z [DEBUG] provider.terraform-provider-aws_v4.15.1_x5: [aws-sdk-go] {
  "clusterName" : "IoCluster",
  "nodegroupName" : "testnodegroup",
  "fargateProfileName" : null,
  "addonName" : null,
  "message" : "Invalid value:  : field must consist of alphanumeric characters, '-', '_' or '.', and must start and end with an alphanumeric character (e.g. 'MyName',  or 'my.name',  or '123-abc', regex used for validation is '([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9]')"
}: timestamp=2022-05-25T12:01:23.041Z

Which may point out that addonName is empty. Unfortunately I cannot find anything in the documentation (https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/eks_node_group) about that variable. If it is really the culprit here could You add that in the documentation?

justinretzolk commented 2 years ago

Hey @kamilporwitaccenture šŸ‘‹ I took another look at this now that we have the debug logs and I'm beginning to wonder if the issue is what you have defined for the aws_eks_node_group.eks_managed_node_group_worker.taint.value argument. Based on the Node taints on managed node groups AWS document:

The value is optional and must begin with a letter or number. It can contain letters, numbers, hyphens (-),periods (.), and underscores (_). It can be up to 63 characters long.

Do you experience the same behavior if you pass null here, rather than "" (an empty string)? Apologies; I'd generally just try to run this test myself, but wanted to get back to you quicker than it would have taken me to spin all of the surrounding resources up.

mattburgess commented 1 year ago

I had a cluster up and running this evening so gave @justinretzolk 's idea a shot. It doesn't matter if "" is passed in as the taint value or not, TF strips that out of the map (see https://gist.github.com/kamilporwitaccenture/ebf73fc90a31b413ac8067657e63cadc#file-out-L883-L885). In addition, if I do give the taint value an invalid string, I get the following back, which is close, but doesn't exactly match the original report:

Message_: "invalid taint spec:   , a qualified name must consist of alphanumeric characters, '-', '_' or '.', and must start and end with an alphanumeric character (e.g. 'MyName',  or 'my.name',  or '123-abc', regex used for validation is '([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9]')",

I was, however, able to reproduce the error by adding the labels that show up in that debug output, specifically, it's the last label that causes the issue:

  labels = {
    "APP"    = "SNAKEMAKE"
    "GITLAB" = "K8S-WORKER"
    "KIAM"   = "AGENT"
    "k8s.amazonaws.com/eniConfig" = ""
  }
Error: error updating EKS Node Group (example:example) config: InvalidParameterException: Invalid value:  : field must consist of alphanumeric characters, '-', '_' or '.', and must start and end with an alphanumeric character (e.g. 'MyName',  or 'my.name',  or '123-abc', regex used for validation is '([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9]')
ā”‚ {
ā”‚   RespMetadata: {
ā”‚     StatusCode: 400,
ā”‚     RequestID: "f7776fc9-d2f1-4435-8036-bb178a279909"
ā”‚   },
ā”‚   ClusterName: "example",
ā”‚   Message_: "Invalid value:  : field must consist of alphanumeric characters, '-', '_' or '.', and must start and end with an alphanumeric character (e.g. 'MyName',  or 'my.name',  or '123-abc', regex used for validation is '([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9]')",
ā”‚   NodegroupName: "example"
ā”‚ }

@kamilporwitaccenture - you might need to go back to AWS support to see how you're expected to set up custom networking for EKS using the API. It would appear that it doesn't let you set Annotations at all, and it doesn't let you set a label with the name of the default label that the CNI is configured to use. The docs state that you can override that default label name by getting the node to set an environment variable, so that might be a way out.

Spritekin commented 1 year ago

Not sure if helpful but... I had my aws_eks_node_groups working in my environments and suddenly it stopped and started firing the same error as the OP. Due to the error message which prints I spent a long time thinking it was a problem with the node group provider and and how it was passing an id with a colon which failed the regex, until I realised there was a bug in my code and one of the labels was being set as empty like:

labels = {
    userid = "1"
    serverid = "1"
    size = ""   <<<<<<<<<< HERE
    environment = "dev"
  }

This is not unexpected as I have seen this problem before with tags in other resources. Apparently AWS does accept empty tags or label values so I wouldn't be surprised if it doesn't accept your taint.value=""

So I fixed the bug and set my label to the proper value size="10" and all worked ok.

So my suggestion. change your code to: value = lookup(var.eks_worker_node_group_taints[0], "value", "undefined")

superbrothers commented 1 year ago

Due to the error message which prints clustername:groupname I spent a long time thinking it was a problem with the node group provider and and how it was passing an id with a colon which failed the regex, until I realised there was a bug in my code and one of the labels was being set as empty like:

I have the same problem and cannot set the label value to empty, which should not be a problem since Kubernetes allows empty label values. I am not sure if this problem is due to a problem with EKS Node Group itself or with this terraform provider.