scholzj / terraform-aws-kubernetes

Terraform module for Kubernetes setup on AWS
Apache License 2.0
202 stars 129 forks source link

Launch configuration is invallid #7

Closed joelchrist closed 6 years ago

joelchrist commented 6 years ago

image

Failed to launch a new EC2 instance. User data for the launch configuration is also very strange:

‹ÿ¤VmoÛ6þ. ÿáæH;€bÝfi›ÖÒÆéŠ&ñ'¶a3hê$³¦H‚<¹VÑ?P/Ž›d@°ùÁós÷ÜñÞôÎBCìªvxe¥I9቗jƒÙkXØÊdÂ×ãÁù‡óÉÛéõÅÉñåïƒ$ÞØ'ôAYs£ôé^²—0¶ÚKzÛ'*85XA$ä²DC¯!W(q<PF_[Uô   36Ã4,7†®¼0!GÏ&FÚL™â^,íš(7Ä7,,Që ½r´—œ«ï8üè¾P†/DX&I@fa~aöWô7Šú«Ss¡t’àÆYOðñúíäøä|~5ý8¹»zñR¤í‘¾Ò«ÃÏ«çv‹>¹˜Í/ŽÏ'ãïCM¥®¡OKëQ8—J[î\^L®&³ù§ÉåìÃôb<¥¯Ògƒ$y3$ ¥
`®Aä„)S-2ÌE¥)ôþ[™€UC ¥ ¹l K(&pãPf°¨!ú(²„ùS…›ÿ5Rrz}v6ÿe:»j¢ËÊk`–DîˆóÑá«ôÙOiwr-ñI°Là½å'MXçb…ñ±@Û/襘lŸnøåÒÂ`ØKð
ÈÃþ1ûcöûºÿ$Úø`   ­!³r…>©«T'b5ÔUÉ*R:@†k%‘•Â9ôÌÅ M1E¿@¯ËgQ—IkrU°RQ ÆD–1Î6!†#Î3ûÅh+²´eŒIäZ™jÃ%²·r&1zG¥X¡r‰‹@·Ü*ìÆóq[6 mé¬Aó[e¤ xóf2=…Ÿ#I^WeÃҌï”[Cÿçà¯¤éÂËÉB¬¼÷±9!W¢h*ÕVYZX[hl"¬«’7;õ¶yy8?<HЈ…Æl<J
WÈ%ÊÕx”D…ùνpÅ
ë²e¶   Œ­°NW$Ðý®í]É:ÔÖÊdz›Mn½Dxz;]¥³á݆ì»ä_ÿë^Eyª6½5h«ªò"Ç$6S°8c²ð¶r,ój~ê@Xfwäí-¼Øo3Þۓw<i@«<Íøè)ëÜM#ù–“OÌZykâdb g“«ù»÷—Óë_çǗïg\Áý³éõIƒ3Ö<9sÞ®U†>Ž¹Áÿu,ðá}ÞïòýÀÿü2q¼’ð$$­ª$
mU÷³åF…;âÞOËÛø^Þ
ôn€0ˆY;gO¶£–AJ^ev39“Bk¦E¦0={¸we)ѓʕŒÓ¹ÙÎ`
¼CCÓY¢òf¾4où£G" |ƒÂ£ƒ¿›ÉÜßZxW†iÚӗÀ|Þå!hy—‚í¦ÒoàÒÝ]T&‹}ìév’\%I¿¼<ƕ×ß>[e€1²+4mÅl÷7°æ£ƒ5+nøÝNƒíº9:<8xŒe*H»F_·¦Xe‚È‘…•rL
¶Fß:ûö·c{É?ÿÿªN­©z  

Using your module as follows:

data "aws_availability_zones" "available" {}

resource "aws_vpc" "default" {
  cidr_block = "${var.cidr}"
  instance_tenancy = "dedicated"

  tags {
    Name = "vpc-${var.name}"
  }
}

resource "aws_internet_gateway" "default" {
  vpc_id = "${aws_vpc.default.id}"
}

resource "aws_route" "internet_access" {
  route_table_id = "${aws_vpc.default.main_route_table_id}"
  destination_cidr_block = "0.0.0.0/0"
  gateway_id = "${aws_internet_gateway.default.id}"
}

resource "aws_subnet" "default" {
  count = 2
  vpc_id = "${aws_vpc.default.id}"
  cidr_block = "10.0.${count.index}.0/24"
  map_public_ip_on_launch = true
  availability_zone = "${data.aws_availability_zones.available.names[count.index]}"
}

resource "aws_route_table_association" "default" {
  count = 2
  route_table_id = "${aws_vpc.default.default_route_table_id}"
  subnet_id = "${aws_subnet.default.*.id[count.index]}"
}

module "kubernetes" {
  source = "scholzj/kubernetes/aws"

  aws_region    = "eu-central-1"
  cluster_name  = "aws-kubernetes"
  master_instance_type = "t2.medium"
  worker_instance_type = "t2.medium"
  ssh_public_key = "~/.ssh/id_rsa.pub"
  ssh_access_cidr = ["0.0.0.0/0"]
  api_access_cidr = ["0.0.0.0/0"]
  min_worker_count = 1
  max_worker_count = 2
  hosted_zone = "cluster.moreapp.com"
  hosted_zone_private = false

  master_subnet_id = "${aws_subnet.default.*.id[0]}"
  worker_subnet_ids = "${aws_subnet.default.*.id}"

  # Tags
  tags = {
    Application = "AWS-Kubernetes"
  }

  # Tags in a different format for Auto Scaling Group
  tags2 = [
    {
      key                 = "Application"
      value               = "AWS-Kubernetes"
      propagate_at_launch = true
    }
  ]

  addons = [
    "https://raw.githubusercontent.com/scholzj/terraform-aws-kubernetes/master/addons/storage-class.yaml",
    "https://raw.githubusercontent.com/scholzj/terraform-aws-kubernetes/master/addons/heapster.yaml",
    "https://raw.githubusercontent.com/scholzj/terraform-aws-kubernetes/master/addons/dashboard.yaml",
    "https://raw.githubusercontent.com/scholzj/terraform-aws-kubernetes/master/addons/external-dns.yaml",
    "https://raw.githubusercontent.com/scholzj/terraform-aws-kubernetes/master/addons/autoscaler.yaml"
  ]
}
scholzj commented 6 years ago

Thanks for raising this issue. The cloud-init configuration is gziped ... so you would probably need to unpack it to make some sense out of it. I will have a look at this and see if I can reproduce it.

joelchrist commented 6 years ago

Upon further inspection (inspecting the response from AWS API) I get this response:

{
  "LaunchConfigurations": [
    {
      "AssociatePublicIpAddress": true,
      "AutoScalingBlockDeviceMappings": [
        {
          "AutoScalingDeviceName": "/dev/sda1",
          "AutoScalingEbs": {
            "AutoScalingDeleteOnTermination": true,
            "AutoScalingVolumeSize": 50,
            "AutoScalingVolumeType": "gp2"
          }
        }
      ],
      "ClassicLinkVPCSecurityGroups": [],
      "CreatedTime": 1.518099809055E9,
      "EbsOptimized": false,
      "IamInstanceProfile": "aws-kubernetes-node",
      "ImageId": "ami-337be65c",
      "InstanceMonitoring": {
        "Enabled": true
      },
      "InstanceType": "t2.medium",
      "KernelId": "",
      "KeyName": "aws-kubernetes",
      "LaunchConfigurationARN": "arn:aws:autoscaling:eu-central-1:644621087594:launchConfiguration:9e0990f2-da63-43ed-9cfe-d6a017181a76:launchConfigurationName/aws-kubernetes-nodes",
      "LaunchConfigurationName": "aws-kubernetes-nodes",
      "RamdiskId": "",
      "SecurityGroups": [
        "sg-86b85aeb"
      ],
      "UserData": "H4sIAAAAAAAA/6RWbW8bNxL+voD/w5xiwMkBXFo5n3NxogOcWE6D2FZh2SnaohUocnbFiEsS5KwiFfnxBfdFVmwXMFp9ITh6Zp4Zztu+d5bQErvZeDyBqjakvQjEK71G9QbmrrZKhM1ocPnxcvxucnt1dnr98yBLN/YZQ9TOnsAwP9zL9jLGdkF7WW/7TEfvoqYGK4iEXFRo6Q0U2qAVFY4G2mpi4mtky3qOwSJhZNYpzONicGfoJggbCwxsbKVT2pYn8GquaQfQREG4Jr5mcYHGRBm0p73sUlf4wOFn/+JzbflcxEWWRSRgDlYY5i5if8UQcK2pv3rtsRDaZBmuvQsEn27fjU/PLmc3k0/jq9HaUblY5u3h9bzA4RdzdLhFn11NZ1enl+PR96Hm0tSRMOSVCyi8z6Wrdgmur8Y34+ns8/h6+nFyNRoM89f5y0GWPYMpEtBCR3DWbEAUhAEiEmlbAi0QFBaiNhR7/62rbURKqj8hkINKkFw00IWLlJIBuPYoCRXMN5B8FKoCYb/H1PHuf4OUnd9eXMx+mExvmugG+89lHQywCAsif8L58Ph1/vK/R3l3ciMII/EKSTAlSPDe8osmrEuxxPRYYNxXDFJEzLZPt/8c5cLBYL+XDOAbUICDU/bLARwI9sfBi2Tjo40kjAHl5BJDtqkr0J2IbWBTV6wmbSIoXGmJrBLeY2A+lUhsiin5BWZVvUy6TDpb6JJVwooSAzAmlGIBvWtCjCecK/fVGidU3jKmJHKjbb3mEi25yFs5k5gnvcajSixRCrlAKESk+05uFXbj+bQtG5Cu8s6iTfmtlQMpCN6+HU/O4f/AkSTf1FXDFXPFd8qtof/1TvBb1nThneVsLiLWwYz62LyQS1E2lepqlZfOlQabCDd1xRuKHQKG5hVb/+94dnyUoRVzg2o0zEpfygXK5WiYJYXZzr305RI3T2RTrgmMLXGTl77MoPs9XTv4inWorZXx5Dw1CdrCBYlweD8XXaWz/YcN2XfJX/7XvYq0OiXyXK97a9BWVR1EGo5Zaiqm4SByxmQZXO2ZCnqFYRQ3kbBSD+TtrYi8PGgz3gPbk3c8ecSQqjxXfHjIOnfzRL7l5GO70sHZNJlHgxTIxfhm9v7D9eT2x9np9Ycp1/A45GJye9YgRow1T858cCutMKQxN/injkW+/5g3+w/54VHg336ZNF5JBIIOErNWVZKBtqr72XInj43CA3EH76flfXwv7wZ6N8AjWETVztmzrcWkZZHyedCqxO5gtmBSGMO0p8QUR8OnYI+34K4sJQbShZZpOjfbGZyF92hpMs100cyX5i3/HdCgiAjfoAzo4fdmMve3VgHepJVhm/YMFbBQdHmIhieeyKVgu4S5DMQbuPSPQOe1VamPAz3BTlboLOuXV8C08vrbF6ctMEZuibatmO3+BtZ8dLBmxe1/t9Ngu25Ojo+O/gOMKR2lW2HYtKZYbaMokMWl9kwKtsLQOpQa+/63EWN72Z8BAAD//043+CR6CQAA"
    },
  ]
}

If I copy the userdata and execute: pbpaster | base64 -D | gunzip I get the following:

Content-Type: multipart/mixed; boundary="MIMEBOUNDARY"
MIME-Version: 1.0

--MIMEBOUNDARY
Content-Disposition: attachment; filename="init-aws-kubernetes-node.sh"
Content-Transfer-Encoding: 7bit
Content-Type: text/x-shellscript
Mime-Version: 1.0

#!/bin/bash

set -o verbose
set -o errexit
set -o pipefail

export KUBEADM_TOKEN=xotghk.xotghkpibfe1jl40

export DNS_NAME=aws-kubernetes.cluster.moreapp.com
export KUBERNETES_VERSION="1.9.2"

# Set this only after setting the defaults
set -o nounset

# We to match the hostname expected by kubeadm an the hostname used by kubelet
FULL_HOSTNAME="$(curl -s http://169.254.169.254/latest/meta-data/hostname)"

# Make DNS lowercase
DNS_NAME=$(echo "$DNS_NAME" | tr 'A-Z' 'a-z')

# Install docker
yum install -y yum-utils device-mapper-persistent-data lvm2
yum-config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo
yum makecache fast
yum install -y docker-ce

# Install Kubernetes components
sudo cat <<EOF > /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=https://packages.cloud.google.com/yum/repos/kubernetes-el7-x86_64
enabled=1
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://packages.cloud.google.com/yum/doc/yum-key.gpg
        https://packages.cloud.google.com/yum/doc/rpm-package-key.gpg
EOF
setenforce 0
yum install -y kubelet-$KUBERNETES_VERSION kubeadm-$KUBERNETES_VERSION kubernetes-cni

# Fix kubelet configuration
sed -i 's/--cgroup-driver=systemd/--cgroup-driver=cgroupfs/g' /etc/systemd/system/kubelet.service.d/10-kubeadm.conf
sed -i '/Environment="KUBELET_CGROUP_ARGS/i Environment="KUBELET_CLOUD_ARGS=--cloud-provider=aws"' /etc/systemd/system/kubelet.service.d/10-kubeadm.conf
sed -i 's/$KUBELET_CGROUP_ARGS/$KUBELET_CLOUD_ARGS $KUBELET_CGROUP_ARGS/g' /etc/systemd/system/kubelet.service.d/10-kubeadm.conf

# Start services
systemctl enable docker
systemctl start docker
systemctl enable kubelet
systemctl start kubelet

# Set settings needed by Docker
sysctl net.bridge.bridge-nf-call-iptables=1
sysctl net.bridge.bridge-nf-call-ip6tables=1

# Fix certificates file on CentOS
if cat /etc/*release | grep ^NAME= | grep CentOS ; then
    rm -rf /etc/ssl/certs/ca-certificates.crt/
    cp /etc/ssl/certs/ca-bundle.crt /etc/ssl/certs/ca-certificates.crt
fi

kubeadm reset
kubeadm join --token $KUBEADM_TOKEN --node-name $FULL_HOSTNAME $DNS_NAME:6443 --discovery-token-unsafe-skip-ca-verification

--MIMEBOUNDARY--
scholzj commented 6 years ago

Hi,

I tried to reproduce this, but it seems to work fine for me. Is the problem reproducible for you? If yes, Could you send me the output generated by terraform apply?

Thanks & Regards Jakub

joelchrist commented 6 years ago

@scholzj See output attached below terraform_output.log

I'm using your module as follows:

data "aws_availability_zones" "available" {}

resource "aws_vpc" "default" {
  cidr_block = "${var.cidr}"
  instance_tenancy = "dedicated"

  tags {
    Name = "vpc-${var.name}"
  }
}

resource "aws_internet_gateway" "default" {
  vpc_id = "${aws_vpc.default.id}"
}

resource "aws_route" "internet_access" {
  route_table_id = "${aws_vpc.default.main_route_table_id}"
  destination_cidr_block = "0.0.0.0/0"
  gateway_id = "${aws_internet_gateway.default.id}"
}

resource "aws_subnet" "default" {
  count = 3
  vpc_id = "${aws_vpc.default.id}"
  cidr_block = "10.0.${count.index}.0/24"
  map_public_ip_on_launch = true
  availability_zone = "${data.aws_availability_zones.available.names[count.index]}"
}

resource "aws_route_table_association" "default" {
  count = 3
  route_table_id = "${aws_vpc.default.default_route_table_id}"
  subnet_id = "${aws_subnet.default.*.id[count.index]}"
}

module "kubernetes" {
  source = "scholzj/kubernetes/aws"

  aws_region    = "eu-central-1"
  cluster_name  = "aws-kubernetes"
  master_instance_type = "t2.medium"
  worker_instance_type = "t2.medium"
  ssh_public_key = "~/.ssh/id_rsa.pub"
  ssh_access_cidr = ["0.0.0.0/0"]
  api_access_cidr = ["0.0.0.0/0"]
  min_worker_count = 1
  max_worker_count = 2
  hosted_zone = "cluster.moreapp.com"
  hosted_zone_private = false

  master_subnet_id = "${aws_subnet.default.*.id[0]}"
  worker_subnet_ids = "${aws_subnet.default.*.id}"

  # Tags
  tags = {
    Application = "AWS-Kubernetes"
  }

  # Tags in a different format for Auto Scaling Group
  tags2 = [
    {
      key                 = "Application"
      value               = "AWS-Kubernetes"
      propagate_at_launch = true
    }
  ]

  addons = [
//    "https://raw.githubusercontent.com/scholzj/terraform-aws-kubernetes/master/addons/storage-class.yaml",
//    "https://raw.githubusercontent.com/scholzj/terraform-aws-kubernetes/master/addons/heapster.yaml",
    "https://raw.githubusercontent.com/scholzj/terraform-aws-kubernetes/master/addons/dashboard.yaml",
    "https://raw.githubusercontent.com/scholzj/terraform-aws-kubernetes/master/addons/external-dns.yaml",
    "https://raw.githubusercontent.com/scholzj/terraform-aws-kubernetes/master/addons/autoscaler.yaml"
  ]
}
scholzj commented 6 years ago

Thanks for providing the configuration and log. The problem is that you create your VPC as dedicated. That means that it can be used only with dedicated hosts. And the autoscaling group which my tooling doesn't currently work with dedicated hosts. When you remove the dedicated tenancy from your VPC configuration (instance_tenancy = "dedicated"), it should start working.

The questions is ... are you using dedicated tenancy by purpose? I never used dedicated instances, but I can have a look at how to add some support for them (no promises - I will first need to look what does it actually mean, but I can look). If you didn't used it on purpose, just remove the line from your Terraform file and it should start working.

joelchrist commented 6 years ago

Thanks for the response! Removing dedicated tenancy fixed the issue. It seemed that I didn't need this, just careless copy wasting. I don't know if you want to add support for people who do use dedicated tenancy, if not you can close this issue. Thanks again!

scholzj commented 6 years ago

Thanks for letting me know it works now.

Unless you actually need it I will not add it right now. I personally never used the dedicated hosts, so I do not need them for my self. I was more thinking about supporting Spot instances to save some money :-). But that would be different issue.