Closed DH-Rancher closed 4 years ago
Referencing for visibility @galal-hussein @superseb
Same issue with:
cloud_provider:
name: external
Which shouldn't even use --cloud-config
if I understand it correctly.
Worked it around in my RKE template like this:
kube-api:
extra_args:
cloud-config: ''
kubelet:
extra_args:
cloud-config: ''
Not sure if both are required or just kubelet
.
Just hit this issue also with RKE v1.0 Can provide more info if it will help
kube-apiserver logs
+ exec kube-apiserver --kubelet-client-certificate=/etc/kubernetes/ssl/kube-apiserver.pem --service-account-key-file=/etc/kubernetes/ssl/kube-service-account-token-key.pem --cloud-config=/etc/kubernetes/cloud-config --tls-cipher-suites=TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305 --requestheader-allowed-names=kube-apiserver-proxy-client --tls-private-key-file=/etc/kubernetes/ssl/kube-apiserver-key.pem --profiling=false --tls-cert-file=/etc/kubernetes/ssl/kube-apiserver.pem --bind-address=0.0.0.0 --advertise-address=172.31.4.11 --storage-backend=etcd3 --etcd-cafile=/etc/kubernetes/ssl/kube-ca.pem --kubelet-client-key=/etc/kubernetes/ssl/kube-apiserver-key.pem --proxy-client-cert-file=/etc/kubernetes/ssl/kube-apiserver-proxy-client.pem --requestheader-client-ca-file=/etc/kubernetes/ssl/kube-apiserver-requestheader-ca.pem --service-node-port-range=30000-32767 --requestheader-username-headers=X-Remote-User --cloud-provider=aws --etcd-keyfile=/etc/kubernetes/ssl/kube-node-key.pem --etcd-servers=https://172.31.4.11:2379,https://172.31.4.14:2379 --allow-privileged=true --enable-admission-plugins=NamespaceLifecycle,LimitRanger,ServiceAccount,DefaultStorageClass,DefaultTolerationSeconds,MutatingAdmissionWebhook,ValidatingAdmissionWebhook,ResourceQuota,NodeRestriction,Priority,TaintNodesByCondition,PersistentVolumeClaimResize --etcd-certfile=/etc/kubernetes/ssl/kube-node.pem --etcd-prefix=/registry --secure-port=6443 --anonymous-auth=false --proxy-client-key-file=/etc/kubernetes/ssl/kube-apiserver-proxy-client-key.pem --insecure-port=0 --requestheader-group-headers=X-Remote-Group --authorization-mode=Node,RBAC --client-ca-file=/etc/kubernetes/ssl/kube-ca.pem --service-cluster-ip-range=10.43.0.0/16 --requestheader-extra-headers-prefix=X-Remote-Extra- --service-account-lookup=true --runtime-config=authorization.k8s.io/v1beta1=true --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
Flag --insecure-port has been deprecated, This flag will be removed in a future version.
I1202 03:24:11.270667 1 server.go:623] external host was not specified, using 172.31.4.11
I1202 03:24:11.271001 1 server.go:149] Version: v1.16.3
F1202 03:24:11.666224 1 config.go:56] Error reading from cloud configuration file /etc/kubernetes/cloud-config: &os.PathError{Op:"open", Path:"/etc/kubernetes/cloud-config", Err:0x2}
rke debug
DEBU[0085] [healthcheck] Failed to check https://localhost:6443/healthz for service [kube-apiserver] on host [13.211.57.225]: Get https://localhost:6443/healthz: Unable to access the service on localhost:6443. The service might be still starting up. Error: ssh: rejected: connect failed (Connection refused)
DEBU[0090] using "/private/tmp/com.apple.launchd.whYo3UnOCi/Listeners" SSH_AUTH_SOCK
DEBU[0090] using "/private/tmp/com.apple.launchd.whYo3UnOCi/Listeners" SSH_AUTH_SOCK
DEBU[0091] [healthcheck] Failed to check https://localhost:6443/healthz for service [kube-apiserver] on host [13.211.132.166]: Get https://localhost:6443/healthz: Unable to access the service on localhost:6443. The service might be still starting up. Error: ssh: rejected: connect failed (Connection refused)
DEBU[0091] [healthcheck] Failed to check https://localhost:6443/healthz for service [kube-apiserver] on host [13.211.57.225]: Get https://localhost:6443/healthz: Unable to access the service on localhost:6443. The service might be still starting up. Error: ssh: rejected: connect failed (Connection refused)
DEBU[0096] Checking container logs
DEBU[0096] using "/private/tmp/com.apple.launchd.whYo3UnOCi/Listeners" SSH_AUTH_SOCK
DEBU[0096] Checking container logs
DEBU[0096] using "/private/tmp/com.apple.launchd.whYo3UnOCi/Listeners" SSH_AUTH_SOCK
FATA[0097] [controlPlane] Failed to bring up Control Plane: [Failed to verify healthcheck: Failed to check https://localhost:6443/healthz for service [kube-apiserver] on host [13.211.132.166]: Get https://localhost:6443/healthz: Unable to access the service on localhost:6443. The service might be still starting up. Error: ssh: rejected: connect failed (Connection refused), log: F1202 03:24:11.456913 1 config.go:56] Error reading from cloud configuration file /etc/kubernetes/cloud-config: &os.PathError{Op:"open", Path:"/etc/kubernetes/cloud-config", Err:0x2}]
I was able to reproduce with RKE v1.0.0:
nodes:
cloud_provider: name: aws
* `rke up` fails with:
FATA[0120] [controlPlane] Failed to bring up Control Plane: [Failed to verify healthcheck: Failed to check https://localhost:6443/healthz for service [kube-apiserver] on host [
The same `cluster.yml` succeeds with RKE v0.3.2.
This was changed in https://github.com/rancher/rke/commit/372393ac1bbf0eaf70048b4061c6816df4018a01#diff-822461a71c0db81849eb077c6cf33d47R39. The file content was being checked for non-empty, in case it will be removed. As we just deploy the cloud-config without any content for AWS (I guess this was a design choice to not avoid having different code path for each cloud provider), it won't deploy and even run the container with rm -f
. This will also break updates (when moving from v0.3.2 to v1.0.0). Workaround is to specify a "default" cloud config for AWS, this is also why upgrade in Rancher (v2.3.2 -> v2.3.3) doesn't break this, as we configure this as default when creating the cluster with the AWS cloud provider.
cloud_provider:
name: aws
awsCloudProvider:
global:
The other workaround (mentioned in https://github.com/rancher/rke/issues/1805#issuecomment-559865752) is basically not setting the parameter so it won't look for it.
Available in RKE v1.0.1-rc1 and RKE v1.1.0-rc1
Following my steps from this comment: https://github.com/rancher/rke/issues/1805#issuecomment-561264252
The cluster is able to be created successfully with RKE v1.0.1-rc1
, and the empty cloud-config is deployed.
The cluster still fails to create with RKE v1.1.0-rc1
with the same error:
FATA[0078] [controlPlane] Failed to bring up Control Plane: [Failed to verify healthcheck: Failed to check https://localhost:6443/healthz for service [kube-apiserver] on host [<host>]: Get https://localhost:6443/healthz: Unable to access the service on localhost:6443. The service might be still starting up. Error: ssh: rejected: connect failed (Connection refused), log: F1206 18:17:46.603009 1 config.go:56] Error reading from cloud configuration file /etc/kubernetes/cloud-config: &os.PathError{Op:"open", Path:"/etc/kubernetes/cloud-config", Err:0x2}]
and no cloud-config is deployed to the node
Available in v1.1.0-rc2
I got passed the error reported in v1.1.0-rc1
by Brandon here: https://github.com/rancher/rke/issues/1805#issuecomment-562682890
But now im getting a different error in v1.1.0-rc3
:
FATA[0160] [workerPlane] Failed to bring up Worker Plane: [Failed to verify healthcheck: Failed to check http://localhost:10248/healthz for service [kubelet] on host [x.x.x.x]: Get http://localhost:10248/healthz: Unable to access the service on localhost:10248. The service might be still starting up. Error: ssh: rejected: connect failed (Connection refused), log: F1224 19:38:29.074157 11538 server.go:271] failed to run Kubelet: could not init cloud provider "aws": error finding instance i-xxxxxxxxxxxxx: "error listing AWS instances: \"NoCredentialProviders: no valid providers in chain. Deprecated.\\n\\tFor verbose messaging see aws.Config.CredentialsChainVerboseErrors\""]
nodes:
- address: OMITTED
internal_address: OMITTED
user: ubuntu
role: [controlplane,worker,etcd]
ssh_key_path: OMITTED
services:
etcd:
snapshot: true
creation: 6h
retention: 24h
cloud_provider:
name: aws
@izaac Test it with AWS nodes having the right IAMProfile.
This is working with the proper IAM profile and following the docs to configure the tag requirements. https://rancher.com/docs/rke/latest/en/config-options/cloud-providers/aws/
Tested with v1.1.0-rc3
Same issue with v1.1.11
Tried setting
cloud_provider:
name: aws
awsCloudProvider:
global:
and
kube-api:
extra_args:
cloud-config: ''
kubelet:
extra_args:
cloud-config: ''
Having issues with verifying health check for both the worker and control plane, with Failed to verify healthcheck:...
IAM permissions and tags have been verified, and interestingly, the nodes come up as Ready
with their internal hostnames, but <none>
for their roles before rke
fails
RKE version:
Docker version: (
docker version
,docker info
preferred)Operating system and kernel: (
cat /etc/os-release
,uname -r
preferred)Type/provider of hosts: (VirtualBox/Bare-metal/AWS/GCE/DO)
AWS
cluster.yml file:
Steps to Reproduce:
Run
rke up --config ./rancher-cluster.yml
, pointing to the aforementioned manifestResults:
Health check of the kube-apiserver service will fail:
The message implies that it cannot read /etc/kubernetes/cloud-config. On the host, this file does not exist.
Workaround
Monitor the docker container on the host with
watch docker ps
When the Kube-apiserver container is failing to start runsudo touch /etc/kubernetes/cloud-config && docker restart kube-apiserver
Note : Omitting
From the manifest yields a successfully created cluster.