heptio / aws-quickstart

AWS Kubernetes cluster via CloudFormation and kubeadm
Apache License 2.0
223 stars 134 forks source link

Unable to register node "ip-x-x-x-x.ec2.internal" with API server: nodes "ip-x-x-x-x.ec2.internal" is forbidden: node "ip-x-x-x-x" cannot modify node "ip-x-x-x-x.ec2.internal" #239

Open wiquan opened 5 years ago

wiquan commented 5 years ago

What steps did you take and what happened: fails to create using QS(5042) Kubernetes AWS CloudFormation Template: Create a Kubernetes.

I used this template https://aws-quickstart.s3.amazonaws.com/quickstart-heptio/templates/kubernetes-cluster.template

to create a stack with 'rollbackOnFail=false' so i can do a post-mortem, then ssh over to the broken master.

I think 'kubeadm init' is failing because the infamous log: forbidden: node "ip-10-126-121-125" cannot modify node "ip-10-126-121-125.ec2.internal"

From what I dug up, this seems like an old issue. I'm not sure why its still presenting when using these cloudFormation templates. In fact , I would like to know if there is any version of this template that is known to work. I can rollback and try a working one.

# ubuntu@ip-10-126-121-125:/var/log$ less cfn-init.log
# SKIP to END

[kubelet] Creating a ConfigMap "kubelet-config-1.11" in namespace kube-system with the configuration for the kubelets in the cluster
[markmaster] Marking the node ip-10-126-121-125 as master by adding the label "node-role.kubernetes.io/master=''"
[markmaster] Marking the node ip-10-126-121-125 as master by adding the taints [node-role.kubernetes.io/master:NoSchedule]
error marking master: timed out waiting for the condition

2019-01-14 04:02:50,963 [ERROR] Error encountered during build of master-setup: Command 04-master-setup failed
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/cfnbootstrap/construction.py", line 542, in run_config
    CloudFormationCarpenter(config, self._auth_config).build(worklog)
  File "/usr/local/lib/python2.7/dist-packages/cfnbootstrap/construction.py", line 260, in build
    changes['commands'] = CommandTool().apply(self._config.commands)
  File "/usr/local/lib/python2.7/dist-packages/cfnbootstrap/command_tool.py", line 117, in apply
    raise ToolError(u"Command %s failed" % name)
ToolError: Command 04-master-setup failed
2019-01-14 04:02:50,967 [ERROR] -----------------------BUILD FAILED!------------------------
2019-01-14 04:02:50,967 [ERROR] Unhandled exception during build: Command 04-master-setup failed
Traceback (most recent call last):
  File "/usr/local/bin/cfn-init", line 171, in <module>
    worklog.build(metadata, configSets)
  File "/usr/local/lib/python2.7/dist-packages/cfnbootstrap/construction.py", line 129, in build
    Contractor(metadata).build(configSets, self)
  File "/usr/local/lib/python2.7/dist-packages/cfnbootstrap/construction.py", line 530, in build
    self.run_config(config, worklog)
  File "/usr/local/lib/python2.7/dist-packages/cfnbootstrap/construction.py", line 542, in run_config
    CloudFormationCarpenter(config, self._auth_config).build(worklog)
  File "/usr/local/lib/python2.7/dist-packages/cfnbootstrap/construction.py", line 260, in build
    changes['commands'] = CommandTool().apply(self._config.commands)
  File "/usr/local/lib/python2.7/dist-packages/cfnbootstrap/command_tool.py", line 117, in apply
    raise ToolError(u"Command %s failed" % name)
ToolError: Command 04-master-setup failed
2019-01-14 04:02:51,135 [DEBUG] CloudFormation client initialized with endpoint https://cloudformation.us-east-1.amazonaws.com
2019-01-14 04:02:51,136 [DEBUG] Signaling resource K8sMasterInstance in stack wiquan-k8-dev10 with unique ID i-0aef1f5a98427db21 and status FAILURE
 $ sudo systemctl status kubelet
sudo: unable to resolve host ip-10-126-121-125
● kubelet.service - kubelet: The Kubernetes Node Agent
   Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor preset: enabled)
  Drop-In: /etc/systemd/system/kubelet.service.d
           └─10-hostname.conf, 10-kubeadm.conf
   Active: active (running) since Mon 2019-01-14 03:59:44 UTC; 3min 46s ago
     Docs: https://kubernetes.io/docs/home/
 Main PID: 2770 (kubelet)
    Tasks: 15
   Memory: 46.7M
      CPU: 4.744s
   CGroup: /system.slice/kubelet.service
           └─2770 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --cloud-provider=aws --cgroup-driver=systemd --cni-bin-dir=/opt/cni/bin --cni-conf-dir=/etc/cni/net.d --network-plugin=cni

Jan 14 04:03:24 ip-10-126-121-125 kubelet[2770]: W0114 04:03:24.747523    2770 cni.go:172] Unable to update cni config: No networks found in /etc/cni/net.d
Jan 14 04:03:24 ip-10-126-121-125 kubelet[2770]: E0114 04:03:24.748005    2770 kubelet.go:2110] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
Jan 14 04:03:26 ip-10-126-121-125 kubelet[2770]: I0114 04:03:26.496865    2770 kubelet_node_status.go:269] Setting node annotation to enable volume controller attach/detach
Jan 14 04:03:26 ip-10-126-121-125 kubelet[2770]: I0114 04:03:26.497445    2770 kubelet_node_status.go:317] Adding node label from cloud provider: beta.kubernetes.io/instance-type=m4.large
Jan 14 04:03:26 ip-10-126-121-125 kubelet[2770]: I0114 04:03:26.497774    2770 kubelet_node_status.go:328] Adding node label from cloud provider: failure-domain.beta.kubernetes.io/zone=us-east-1b
Jan 14 04:03:26 ip-10-126-121-125 kubelet[2770]: I0114 04:03:26.498107    2770 kubelet_node_status.go:332] Adding node label from cloud provider: failure-domain.beta.kubernetes.io/region=us-east-1
Jan 14 04:03:26 ip-10-126-121-125 kubelet[2770]: I0114 04:03:26.503904    2770 kubelet_node_status.go:79] Attempting to register node ip-10-126-121-125.ec2.internal

Jan 14 04:03:26 ip-10-126-121-125 kubelet[2770]: E0114 04:03:26.507458    2770 kubelet_node_status.go:103] Unable to register node "ip-10-126-121-125.ec2.internal" with API server: nodes "ip-10-126-121-125.ec2.internal" is forbidden: node "ip-10-126-121-125" cannot modify node "ip-10-126-121-125.ec2.internal"

Jan 14 04:03:29 ip-10-126-121-125 kubelet[2770]: W0114 04:03:29.749511    2770 cni.go:172] Unable to update cni config: No networks found in /etc/cni/net.d
Jan 14 04:03:29 ip-10-126-121-125 kubelet[2770]: E0114 04:03:29.750280    2770 kubelet.go:2110] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized

What did you expect to happen: I was hoping this quickstart would build a working cluster.

Anything else you would like to add: /etc/hosts is missing the line for the internal ip (10.x.x.x) and hostname (ip-10-x-x-x) so 'hostname --fqdn' is only returning the short hostname.

Environment:

wiquan commented 5 years ago

I dont know if this is correct or not, but it works-for-me.

jmbeach commented 3 years ago

My problem was that I was updating the hostname after the cluster was created. By doing that, it's like the master didn't know it was the master.

I am still running:

sudo hostname $(curl 169.254.169.254/latest/meta-data/hostname)

but now I run it before the cluster initialization