jx3-gitops-repositories / jx3-terraform-eks

Jenkins X 3.x Infrastructure Git Template for Terraform and EKS for managing cloud resources
Apache License 2.0
9 stars 41 forks source link

Errors when creating a new EKS build #41

Closed tgelpi-bot closed 2 months ago

tgelpi-bot commented 2 months ago

Trying to build a new EKS env using doc https://jenkins-x.io/v3/admin/platforms/eks/. Get different errors when trying to build the infra.

terraform init

There are conflicting constraints for provider hashicorp/aws │ Error: Failed to query available provider packages │ │ Could not retrieve the list of available versions for provider hashicorp/aws: no available releases match the given constraints │ >= 2.23.0, >= 2.70.0, >= 3.56.0, > 4.0.0, >= 4.0.0, >= 4.33.0, < 5.0.0, >= 5.30.0, >= 5.58.0

To correct the issue modified required_providers to 6.0

required_providers { aws = "> 4.0, < 6.0"

terraform plan

│ Error: Missing required argument │ │ on main.tf line 85, in module "eks-jx": │ 85: module "eks-jx" { │ │ The argument "cluster_version" is required, but no definition was found.

To correct the issue, modified main.tf to include cluster_version

terraform plan

│ Error: Reference to undeclared resource │ │ on main.tf line 25, in module "vpc": │ 25: azs = data.aws_availability_zones.available.names │ │ A data resource "aws_availability_zones" "available" has not been declared in the root module.

At this stage I stopped making changes.

I did manage to build a EKS/Vault environment from a previous commit of https://github.com/jx3-gitops-repositories/jx3-terraform-eks (fca5806) using https://github.com/jx3rocks/jx3-terraform-eks-fca5806

nohierhassan commented 2 months ago

I am facing the same issues

msvticket commented 2 months ago

Do try now. I have done changes in this repo also and removed kuberhealthy from https://github.com/jx3-gitops-repositories/jx3-eks-vault and https://github.com/jx3-gitops-repositories/jx3-eks-asm

nohierhassan commented 2 months ago

The above errors are solved now, thanks @msvticket

However, I faced the below error when applying the resources for the first time.

│ Error: reading EKS Cluster (jx): couldn't find resource
│ 
│   with module.eks-jx.module.cluster.data.aws_eks_cluster.cluster,
│   on .terraform/modules/eks-jx/modules/cluster/main.tf line 4, in data "aws_eks_cluster" "cluster":
│    4: data "aws_eks_cluster" "cluster" {
│ 

Is this related to the fact written here ? @msvticket please let me know if I need to open new issue for this

From version 3.0.0 this module creates neither the EKS cluster nor the VPC.

msvticket commented 2 months ago
│ Error: reading EKS Cluster (jx): couldn't find resource
│ 
│   with module.eks-jx.module.cluster.data.aws_eks_cluster.cluster,
│   on .terraform/modules/eks-jx/modules/cluster/main.tf line 4, in data "aws_eks_cluster" "cluster":
│    4: data "aws_eks_cluster" "cluster" {
│ 

Weird, it works for me. Maybe it's a race condition. Does it work if you run terraform apply again?

nohierhassan commented 2 months ago

No, it does not. Would it make a difference if I provide the 'cluster_name' as a variable?

msvticket commented 2 months ago

No, it does not. Would it make a difference if I provide the 'cluster_name' as a variable?

No, I didn't do that when I tried. I was actually about to ask if you had set the variable...

What version of terraform are you using?

nohierhassan commented 2 months ago

I'm using Terraform 1.4.6 and below is the values.auto.tfvars

#cluster_name = "jx"
cluster_version = "1.30"
region = "us-east-1"
jx_git_url = "https://github.com/JX-Investigation/jx-3-cluster"
jx_bot_username = "jx-3-test-bot"
use_asm = true

When I removed the cluster_name from the values file, it worked, then an error occurred while running the apply command. Now when I run destroy or apply commands again, I face the EKS cluster is not found error.

nohierhassan commented 2 months ago

I had to delete the state file (or delete the cluster from the state) so I can apply the file again.

I 'd say the below line is the problem in modules/eks-jx/modules/cluster/main.tf line 4 where it always tries to fetch the created cluster, rather than check if it is created or not.

data "aws_eks_cluster" "cluster" {
  name = var.cluster_name
}
tgelpi-bot commented 2 months ago

This update appears to work. I am having performance issues with a current config which has only defaults (non TLS, DNS). It is very sluggish and problematic. Just wrote a complete description and it was lost during editing when I inadvertently click on a link and couldn't get back to the edit. I'll try to recap at another time.

msvticket commented 2 months ago

I had to delete the state file (or delete the cluster from the state) so I can apply the file again.

Did you used an old state file when you tried first? That might the problem.

I 'd say the below line is the problem in modules/eks-jx/modules/cluster/main.tf line 4 where it always tries to fetch the created cluster, rather than check if it is created or not.

data "aws_eks_cluster" "cluster" {
  name = var.cluster_name
}

Well, no, the cluster should be created at this time (since this module get the value from the eks module which shouldn't return it before the cluster is created) and for me it always is. But I see that you use asm, which I didn't try now since Ted reported that there where problems with vault. I'll try with asm and see if I can reproduce the problem.

msvticket commented 2 months ago

For me it works fine with asm as well. I use terraform version 1.5.7.

nohierhassan commented 2 months ago

Thanks @msvticket :pray:

tgelpi-bot commented 2 months ago

I'm getting the previous issue when I try to set a cluster name. in my values.auto.tfvars file

Error: reading EKS Cluster (jx): couldn't find resource

with module.eks-jx.module.cluster.data.aws_eks_cluster.cluster, on .terraform/modules/eks-jx/modules/cluster/main.tf line 4, in data "aws_eks_cluster" "cluster": 4: data "aws_eks_cluster" "cluster" {

When I remove it it works. I'm using Terraform v1.6.4. At the time of this message it is creating the infrastructure. It is a minimal build but using ASM. This is my values.auto.tfvars

use_asm = true jx_git_url = "https://github.com/jx3rocks/jx3-eks-asm.dfl.git" cluster_version = "1.29" force_destroy = true region = "us-west-2"

When I tried setting the was profile variable setting in my tfvars file it claimed an unknown variable. I noticed it is no longer in the variable.tf.

tgelpi-bot commented 2 months ago

@msvticket I am building a new ASM/DNS/TLS environment with the latest repos. It's a brand new environment. I'm having some issues primarily where I can't set the cluster_namevariable. I also am uncertain if I'm configuring the environment correctly. Here what was done. values.auto.tfvars

apex_domain = "my.com" cluster_name = "jx3a30" cluster_version = "1.30" create_and_configure_subdomain = true create_asm_role = true enable_external_dns = true enable_tls = true force_destroy = true jx_git_url = "https://github.com/jx3rocks/jx3-eks-asm.a30.git" manage_apex_domain = true manage_subdomain = true production_letsencrypt = true profile = "awsprofile" region = "us-west-2" subdomain = "a30" tls_email = "mymail@gmail.com" use_asm = true

Not sure if is required, but to reduce the warnings I added the following to variables.tf

apex_domain create_and_configure_subdomain create_asm_role enable_external_dns enable_tls manage_apex_domain manage_subdomain production_letsencrypt profile subdomain tls_email

I also noticed under Migrating to current version of module from a version prior to 3.0.0 that main.tf needs the following tweaks.

prefix_separator = "" iam_role_name = local.cluster_name cluster_security_group_name = local.cluster_name cluster_security_group_description = "EKS cluster security group."

It isn't a migration but wasn't in the current main.tfso I added it in the eks module following enable_irsa

enable_irsa = true prefix_separator = "" iam_role_name = local.cluster_name cluster_security_group_name = local.cluster_name cluster_security_group_description = "EKS cluster security group."

A terraform plan produces an error and warning when the cluster_name variable was set.

Warning: Argument is deprecated

with module.eks.aws_iam_role.this[0], on .terraform/modules/eks/main.tf line 394, in resource "aws_iam_role" "this": 394: resource "aws_iam_role" "this" {

Use the aws_iam_role_policy resource instead. If Terraform should exclusively manage all inline policy associations (the current behavior of this argument), use the aws_iam_role_policies_exclusive resource as well.

Error: reading EKS Cluster (jx3a30): couldn't find resource

with module.eks-jx.module.cluster.data.aws_eks_cluster.cluster, on .terraform/modules/eks-jx/modules/cluster/main.tf line 4, in data "aws_eks_cluster" "cluster": 4: data "aws_eks_cluster" "cluster" {

Removing the cluster_name variable setting cleared up the warning and error and the infra was built.

I tried building a new environment without adding anything to main.tf or variables.tfOnly set variables in values.auto.tfvar that were available in variables.tf . It eventually was created had severe performance issues. The dashboard took over 30 seconders to display Does a minimum build require additional changes to main.tf and variables.tf?

I haven't tested my new cluster yet because at the time of this writing it was still building. I just wanted to confirm these findings before I lose them. Will keep you posted.

tgelpi-bot commented 2 months ago

It finished building but I still receive the deprecated argument warning. ╷

│ Warning: Argument is deprecated │ │ with module.eks.aws_iam_role.this[0], │ on .terraform/modules/eks/main.tf line 394, in resource "aws_iam_role" "this": │ 394: resource "aws_iam_role" "this" { │ │ Use the aws_iam_role_policy resource instead. If Terraform should exclusively manage all inline policy │ associations (the current behavior of this argument), use the aws_iam_role_policies_exclusive resource as well. ╵

tgelpi-bot commented 2 months ago

There is something influencing the performance when using this new ASM configuration. Accessing the dashboard is painfully slow. The first clicks take well over 30 seconds to render a display and subsequent clicks I get a server not found. When I try to deploy a new node application (with QuickStart) it does an import to repo but I don't see a verify pipeline step.

I was reviewing the docs. Is there only one cluster template to use? In the past I used jx3-gitops-repositories/jx3-eks-asm when using ASM and jx3-gitops-repositories/jx3-eks-vault when using Vault. Has this been changed to only using vault?

tgelpi-bot commented 2 months ago

Looks like this performance issue may be caused by load balancer health checks. Will try using an ALL rule in the node group security group for a workaround for now.

Screen Shot 2024-09-24 at 10 18 40 AM