heptio / aws-quickstart

AWS Kubernetes cluster via CloudFormation and kubeadm
Apache License 2.0
223 stars 134 forks source link

AWS Quickstart Failed initializing the control plane #223

Open rbankston opened 5 years ago

rbankston commented 5 years ago

What steps did you take and what happened:

Used Launch Latest Quickstart Now in two different regions. Both US-East-2 and US-West-2 failed with the error message in the cloud init output log of:

+ /usr/local/bin/cfn-init --verbose --stack Heptio-Kubernetes-K8sStack-I54YC612LLPR --region us-west-2 --resource K8sMasterInstance --configsets master-setup
Error occurred during build: Command 04-master-setup failed

When running /tmp/setup-k8s-master.sh on a node that failed it errors with the command:

# Initialize master node
kubeadm init --config /tmp/kubeadm.yaml
unable to decode config from bytes: couldn't unmarshal YAML: yaml: line 9: could not find expected ':'

What did you expect to happen: The master to be initialized.

Environment:

timothysc commented 5 years ago

/assign @vincepri
/cc @chuckha

rbankston commented 5 years ago

Did some further testing. When copying and pasting the same yaml into the same file the kubeadm init works without issue. When called from the script it fails with that error. Tested on three attempts all same result.

vincepri commented 5 years ago

I just tried to launch a cluster in us-east-1 and the bootstrap was successful, not sure how to reproduce this :/, do you have a way to paste the output of /tmp/kubeadm.yml and the output when you run the script?

rbankston commented 5 years ago

kubeadm.yml

apiVersion: kubeadm.k8s.io/v1alpha2
kind: MasterConfiguration
kubernetesVersion: v1.11.2
api:
  controlPlaneEndpoint: heptio-ku-apiloadb-blzq3ml8uy4a-875862043.us-east-2.elb.amazonaws.com:443
apiServerCertSANs:
- 18.220.182.151
apiServerExtraArgs:
  cloud-provider: aws
nodeRegistration:
  name: ip-10-0-16-215.us-east-2.compute.internal
  kubeletExtraArgs:
    cloud-provider: aws
bootstrapTokens:
- groups:
  - system:bootstrappers:kubeadm:default-node-token
  token: tag0aq.ytwrr5hz2xediuhk
  ttl: 0s
  usages:
  - signing
  - authentication
controllerManagerExtraArgs:
  cloud-provider: aws
  allocate-node-cidrs: "false"
featureGates:
  CoreDNS: True
networking:
  podSubnet: 192.168.0.0/16

The yaml is valid but when called from the bash script gives that error. If you initialize with kubeadm it works without issue.

rbankston commented 5 years ago

Tried to launch in US-EAST-1 as well and received the same failure.

detiber commented 5 years ago

We really should be quoting any strings that may include : in them

rbankston commented 5 years ago

Found the issue I was hitting:

dig +short heptio-ku-apiloadb-11rt6apgartli-575084224.us-west-2.elb.amazonaws.com
50.112.147.90
52.24.28.86

which is called from https://github.com/heptio/aws-quickstart/blob/master/scripts/setup-k8s-master.sh.in#L49 and then placed into the apiServerCertSANs field has two ip addresses without the second one getting the - for the list. Should we be using the DNS name in that field because the IP addresses on an ELB are ephemeral and can change at any time without notice for classic load balancers

---
apiVersion: kubeadm.k8s.io/v1alpha2
kind: MasterConfiguration
kubernetesVersion: v1.11.2
api:
  controlPlaneEndpoint: heptio-ku-apiloadb-11rt6apgartli-575084224.us-west-2.elb.amazonaws.com:443
apiServerCertSANs:
- 52.24.28.86
50.112.147.90
apiServerExtraArgs:
  cloud-provider: aws
nodeRegistration:
  name: ip-10-0-0-112.us-west-2.compute.internal
  kubeletExtraArgs:
    cloud-provider: aws
bootstrapTokens:
- groups:
  - system:bootstrappers:kubeadm:default-node-token
  token: zp0ezi.4nd1kitogs21u7aa
  ttl: 0s
  usages:
  - signing
  - authentication
controllerManagerExtraArgs:
  cloud-provider: aws
  allocate-node-cidrs: "false"
rbankston commented 5 years ago

Been able to reproduce this issue by entering your Admin ingress location not as 0.0.0.0/0

vincepri commented 5 years ago

@rbankston I think part of this should be fixed in https://github.com/heptio/aws-quickstart/pull/224

Regarding the apiServerCertSANs, do you suggest to use the ELB dns name instead?

vincepri commented 5 years ago

I think the ELB IP was chosen because this https://github.com/heptio/aws-quickstart/blob/master/scripts/setup-k8s-master.sh.in#L129 was erroring in case we'd specify a dnsname, @chuckha @detiber any suggestions on that?

ahren1234 commented 5 years ago

this is still an issue, and has to do with what was talked about earlier. A correction could be turning this variable into a array and printing elements accordingly. An example is below (without proper validation/sanitation)

LB_IPV4=($(dig +short example.amazonaws.com)) .... certSANs: