marcellodesales commented 4 years ago

Hi there,

Thank you for the awesome tutorial at https://learnk8s.io/terraform-eks#you-can-provision-an-eks-cluster-with-terraform-too... Very useful as I was looking for an example to get different clusters per environment... I just need 2... really appreciated your work!!!

Just got an error creating the cluster using the step 6. I had updated a couple of properties shown below, but here's the error...

Error

I'm getting the following error:

module.prd_cluster.module.eks.aws_iam_role_policy_attachment.workers_AmazonEKS_CNI_Policy[0]: Refreshing state... [id=eks-prd-super-cash-example-com20201018045153077400000007-2020101804515980130000000a]
module.prd_cluster.module.eks.aws_iam_role_policy_attachment.workers_AmazonEC2ContainerRegistryReadOnly[0]: Refreshing state... [id=eks-prd-super-cash-example-com20201018045153077400000007-20201018045159789200000008]
module.prd_cluster.module.eks.aws_iam_role_policy_attachment.workers_AmazonEKSWorkerNodePolicy[0]: Refreshing state... [id=eks-prd-super-cash-example-com20201018045153077400000007-2020101804515988710000000b]
module.prd_cluster.module.eks.aws_iam_role_policy_attachment.workers_additional_policies[0]: Refreshing state... [id=eks-prd-super-cash-example-com20201018045153077400000007-20201018045159794400000009]

Error: Kubernetes cluster unreachable: Get https://44C5045D2C00520DBF55914A260A17C8.
   gr7.sa-east-1.eks.amazonaws.com/version?timeout=32s: dial tcp: lookup 
   44C5045D2C00520DBF55914A260A17C8.gr7.sa-east-1.eks.amazonaws.com on 192.168.1.1:53: 
   read udp 192.168.1.35:54700->192.168.1.1:53: i/o timeout

At this point, I know I can ping amazonaws.com... But maybe we are missing a security group? The cluster got created...

Environment

$ terraform version
Terraform v0.13.4
+ provider registry.terraform.io/hashicorp/aws v3.11.0
+ provider registry.terraform.io/hashicorp/helm v1.3.1
+ provider registry.terraform.io/hashicorp/kubernetes v1.13.2
+ provider registry.terraform.io/hashicorp/local v2.0.0
+ provider registry.terraform.io/hashicorp/null v3.0.0
+ provider registry.terraform.io/hashicorp/random v3.0.0
+ provider registry.terraform.io/hashicorp/template v2.2.0

Setup

The UI lists the clusters

I can also list them from the CMD

$ aws eks list-clusters
{
    "clusters": [
        "eks-prd-super-cash-example-com",
        "eks-ppd-super-cash-example-com"
    ]
}

Missing sep to install the authenticator

ATTENTION: The article doesn't mention the creation of the aws-iam-authenticator

All the kubeconfig files were created with the authenticator dependency

$ kubectl get pods --all-namespaces
Unable to connect to the server: getting credentials: exec: exec: "aws-iam-authenticator": executable file not found in $PATH

$ brew install aws-iam-authenticator

Just got the list of files

$ ls -la kubeconfig_eks-p*
-rw-r--r--  1 marcellodesales  staff  2056 Oct 18 01:52 kubeconfig_eks-ppd-super-cash-example-com
-rw-r--r--  1 marcellodesales  staff  2056 Oct 18 01:51 kubeconfig_eks-prd-super-cash-example-com

$ kubectl get pods --all-namespaces
NAMESPACE     NAME                                                  READY   STATUS    RESTARTS   AGE
default       ingress-aws-alb-ingress-controller-6ccd59df99-8lsvh   0/1     Pending   0          29m
kube-system   coredns-59dcf49c5-5wkkf                               0/1     Pending   0          32m
kube-system   coredns-59dcf49c5-hbqtl                               0/1     Pending   0          32m

Other changes made to the original

Changed Kubernetes version from 1.17 to 1.18
Changed the subnets to have odd and even octets per subnet type... Not sure if that would affect the access...

  private_subnets      = ["172.16.1.0/24", "172.16.3.0/24", "172.16.5.0/24"]
  public_subnets       = ["172.16.2.0/24", "172.16.4.0/24", "172.16.6.0/24"]

API server SSL certs might be wrong

I'm not sure if the problem is related to the certs... Even though it says unreachable, I can see that the certs are incorrect...

$ curl -v  https://DCF5F17BFF0ACDC562845DA97F3B171F.sk1.sa-east-1.eks.amazonaws.com/api/v1/namespaces/kube-system/configmaps
*   Trying 54.207.147.62...
* TCP_NODELAY set
* Connected to DCF5F17BFF0ACDC562845DA97F3B171F.sk1.sa-east-1.eks.amazonaws.com (54.207.147.62) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*   CAfile: /etc/ssl/cert.pem
  CApath: none
* TLSv1.2 (OUT), TLS handshake, Client hello (1):
* TLSv1.2 (IN), TLS handshake, Server hello (2):
* TLSv1.2 (IN), TLS handshake, Certificate (11):
* TLSv1.2 (OUT), TLS alert, unknown CA (560):
* SSL certificate problem: unable to get local issuer certificate
* Closing connection 0
curl: (60) SSL certificate problem: unable to get local issuer certificate
More details here: https://curl.haxx.se/docs/sslcerts.html

curl failed to verify the legitimacy of the server and therefore could not
establish a secure connection to it. To learn more about this situation and
how to fix it, please visit the web page mentioned above.

Thank you Marcello

marcellodesales commented 4 years ago

`t2.micro` problems - 0/1 nodes are available: 1 Too many pods.

Creating a dev cluster I got 0/1 nodes are available: 1 Too many pods.... Even though there's an autoscale group for the cluster... Not sure the reason... I changed to t2.medium and resolved...

$ kubectl get pods
NAME                                                READY   STATUS    RESTARTS   AGE
ingress-aws-alb-ingress-controller-66f95d8d-v9n6m   0/1     Pending   0          114s

 ☸️  kubectl@1.18.6 📛 kustomize@v3.8.1     🧾 terraform@v0.13.4
provider
⎈ default 🔐 eks_eks-ppd-super-cash-example-com
~/dev/github.com/k-mitevski/terraform-k8s/06_terraform_envs_customised/environments/ppd on  master! ⌚ 13:53:13
$ kubectl describe pod ingress-aws-alb-ingress-controller-66f95d8d-v9n6m
Name:           ingress-aws-alb-ingress-controller-66f95d8d-v9n6m
Namespace:      default
Priority:       0
Node:           <none>
Labels:         app.kubernetes.io/instance=ingress
                app.kubernetes.io/name=aws-alb-ingress-controller
                pod-template-hash=66f95d8d
Annotations:    kubernetes.io/psp: eks.privileged
Status:         Pending
IP:
IPs:            <none>
Controlled By:  ReplicaSet/ingress-aws-alb-ingress-controller-66f95d8d
Containers:
  aws-alb-ingress-controller:
    Image:      docker.io/amazon/aws-alb-ingress-controller:v1.1.8
    Port:       10254/TCP
    Host Port:  0/TCP
    Args:
      --cluster-name=eks-ppd-super-cash-example-com
      --ingress-class=alb
    Environment:  <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from ingress-aws-alb-ingress-controller-token-bgv6p (ro)
Conditions:
  Type           Status
  PodScheduled   False
Volumes:
  ingress-aws-alb-ingress-controller-token-bgv6p:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  ingress-aws-alb-ingress-controller-token-bgv6p
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason            Age                 From               Message
  ----     ------            ----                ----               -------
  Warning  FailedScheduling  35s (x5 over 2m6s)  default-scheduler  0/1 nodes are available: 1 Too many pods.

Missing the install of the `cluster autoscaler`

Missing the auto-scaler, so the dev environment does not work since new hosts can't be created
- EKS requires 4 pods
- Micro profile only allows 4 pods (https://github.com/awslabs/amazon-eks-ami/blob/master/files/eni-max-pods.txt)
Requires 2 pieces:
- AIM roles
- Helm chart for autoscaler

marcellodesales commented 4 years ago

Error when Re-running

Just got this error after re-running:

Error: error creating EKS Node Group (eks-ppd-super-cash-example-com:eks-ppd-super-cash-example-com-first-grand-primate): InvalidParameterException: Subnets are not tagged with the required tag. Please tag all subnets with Key: kubernetes.io/cluster/eks-ppd-super-cash-example-com Value: shared
{
  RespMetadata: {
    StatusCode: 400,
    RequestID: "249ff5ae-e506-40aa-a56f-ecc3441e856e"
  },
  ClusterName: "eks-ppd-super-cash-example-com",
  Message_: "Subnets are not tagged with the required tag. Please tag all subnets with Key: kubernetes.io/cluster/eks-ppd-super-cash-example-com Value: shared",
  NodegroupName: "eks-ppd-super-cash-example-com-first-grand-primate"
}

I noticed that the subgroups are not prefixed with eks, which is the name of the cluster...

FROM

  public_subnet_tags = {
    "kubernetes.io/cluster/${var.cluster_name}" = "shared"
    "kubernetes.io/role/elb"                    = "1"
  }

  private_subnet_tags = {
    "kubernetes.io/cluster/${var.cluster_name}" = "shared"
    "kubernetes.io/role/internal-elb"           = "1"
  }

TO

  public_subnet_tags = {
    "kubernetes.io/cluster/eks-${local.env_domain}" = "shared"
    "kubernetes.io/role/elb"                        = "1"
  }

  private_subnet_tags = {
    "kubernetes.io/cluster/eks-${local.env_domain}" = "shared"
    "kubernetes.io/role/internal-elb"               = "1"
  }

k-mitevski / terraform-k8s

Terraform Creation failing following https://learnk8s.io/terraform-eks: Error: Kubernetes cluster unreachable: Get #2

Error

Environment

Setup

Missing sep to install the authenticator

Other changes made to the original

API server SSL certs might be wrong

`t2.micro` problems - 0/1 nodes are available: 1 Too many pods.

Missing the install of the `cluster autoscaler`

Error when Re-running

FROM

TO

k-mitevski / terraform-k8s

Terraform Creation failing following https://learnk8s.io/terraform-eks: Error: Kubernetes cluster unreachable: Get #2

Error

Environment

Setup

Missing sep to install the authenticator

Other changes made to the original

API server SSL certs might be wrong

t2.micro problems - 0/1 nodes are available: 1 Too many pods.

Missing the install of the cluster autoscaler

Error when Re-running

FROM

TO

`t2.micro` problems - 0/1 nodes are available: 1 Too many pods.

Missing the install of the `cluster autoscaler`