openshift / installer

Install an OpenShift 4.x cluster
https://try.openshift.com
Apache License 2.0
1.44k stars 1.38k forks source link

kubelet service not started on scaleup node boot #1899

Closed DanyC97 closed 5 years ago

DanyC97 commented 5 years ago

Version

$ openshift-install version
openshift-install v4.1.0-201905212232-dirty
built from commit 71d8978039726046929729ad15302973e3da18ce
release image quay.io/openshift-release-dev/ocp-release@sha256:b8307ac0f3ec4ac86c3f3b52846425205022da52c16f56ec31cbe428501001d6

RHCOS

rpm-ostree status
State: idle
AutomaticUpdates: disabled
Deployments:
● pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:53389c9b4a00d7afebb98f7bd9d20348deb1d77ca4baf194f0ae1b582b7e965b
              CustomOrigin: Provisioned from oscontainer
                   Version: 410.8.20190520.0 (2019-05-20T22:55:04Z)

Platform (aws|libvirt|openstack):

vmware

What happened?

Deployed an UPI VMware cluster and then started to add a new node. The VM booted, the igniton kicked in and lay down the RHCOS, including setting the static IP and the hostname however the node it never joined the K8s cluster, oc get nodes didn't show the new node

What you expected to happen?

i expected the new node to show up in the oc get nodes

How to reproduce it (as minimally and precisely as possible)?

1) create a cluster 2) scale up a node using a custom dani-k8s-node-2.ign file where the below files were added such that we can set static IP and hostname

cat > ${NGINX_DIRECTORY}/${HOST}-ens192 << EOF
DEVICE=ens192
BOOTPROTO=none
ONBOOT=yes
NETMASK=${NETMASK}
IPADDR=${IP}
GATEWAY=${GATEWAY}
PEERDNS=no
DNS1=${DNS1}
DNS2=${DNS2}
DOMAIN=${DOMAIN_NAME}
IPV6INIT=no
EOF

  ENS192=$(cat ${NGINX_DIRECTORY}/${HOST}-ens192 | base64 -w0)
  rm ${NGINX_DIRECTORY}/${HOST}-ens192

  cat > ${NGINX_DIRECTORY}/${HOST}-ifcfg-ens192.json << EOF
{
  "append" : false,
  "mode" : 420,
  "filesystem" : "root",
  "path" : "/etc/sysconfig/network-scripts/ifcfg-ens192",
  "contents" : {
    "source" : "data:text/plain;charset=utf-8;base64,${ENS192}",
    "verification" : {}
  },
  "user" : {
    "name" : "root"
  },
  "group": {
    "name": "root"
  }
}
EOF

&

cat > ${NGINX_DIRECTORY}/${HOST}-hostname << EOF
${HOST}.${DOMAIN_NAME}
EOF
  HN=$(cat ${NGINX_DIRECTORY}/${HOST}-hostname | base64 -w0)
  rm ${NGINX_DIRECTORY}/${HOST}-hostname
  cat > ${NGINX_DIRECTORY}/${HOST}-hostname.json << EOF
{
  "append" : false,
  "mode" : 420,
  "filesystem" : "root",
  "path" : "/etc/hostname",
  "contents" : {
    "source" : "data:text/plain;charset=utf-8;base64,${HN}",
    "verification" : {}
  },
  "user" : {
    "name" : "root"
  },
  "group": {
    "name": "root"
  }
}
EOF

Note that a dummy ignition file

dani-k8s-node-2-ignition-starter.ign
{
  "ignition": {
    "config": {
      "append": [
        {
          "source": "http://10.85.174.63/repo/dani-k8s-node-2.ign",
          "verification": {}
        }
      ]
    },
    "timeouts": {},
    "version": "2.2.0"
  },
  "networkd": {},
  "passwd": {},
  "storage": {},
  "systemd": {}
}

was passed which then redirected to the custom dani-k8s-node-2.ign file which had injected the above snippets. That triggered a reboot between the ignition apply steps

3) observe if the node join the cluster, if not then check if kubelet.service is up and running.

Note - i guess your terraform UPI's vsphere example should allow you to reproduce the issue, i haven't tried using your code

Anything else we need to know?

On closer inspection while ssh'ed onto the node i found the following

and 

[root@dani-k8s-node-2 ~]# systemctl status kubelet ● kubelet.service - Kubernetes Kubelet Loaded: loaded (/etc/systemd/system/kubelet.service; disabled; vendor preset: enabled) Active: inactive (dead)


* although the systemd file looks okay, there is no symlink created which explains why the service is not running and enabled

[root@dani-k8s-node-2 ~]# cat /etc/systemd/system/kubelet.service [Unit] Description=Kubernetes Kubelet Wants=rpc-statd.service

[Service] Type=notify ExecStartPre=/bin/mkdir --parents /etc/kubernetes/manifests ExecStartPre=/bin/rm -f /var/lib/kubelet/cpu_manager_state EnvironmentFile=/etc/os-release EnvironmentFile=-/etc/kubernetes/kubelet-workaround EnvironmentFile=-/etc/kubernetes/kubelet-env

ExecStart=/usr/bin/hyperkube \ kubelet \ --config=/etc/kubernetes/kubelet.conf \ --bootstrap-kubeconfig=/etc/kubernetes/kubeconfig \ --kubeconfig=/var/lib/kubelet/kubeconfig \ --container-runtime=remote \ --container-runtime-endpoint=/var/run/crio/crio.sock \ --allow-privileged \ --node-labels=node-role.kubernetes.io/worker,node.openshift.io/os_version=${VERSION_ID},node.openshift.io/os_id=${ID} \ --minimum-container-ttl-duration=6m0s \ --volume-plugin-dir=/etc/kubernetes/kubelet-plugins/volume/exec \ --client-ca-file=/etc/kubernetes/ca.crt \ --cloud-provider=vsphere \ \ --anonymous-auth=false \ --v=3 \

Restart=always RestartSec=10

[Install] WantedBy=multi-user.target


but no symlink present

[root@dani-k8s-node-2 ~]# ll /etc/systemd/system/multi-user.target.wants/ total 0 lrwxrwxrwx. 1 root root 39 May 20 23:06 chronyd.service -> /usr/lib/systemd/system/chronyd.service lrwxrwxrwx. 1 root root 70 May 20 23:06 console-login-helper-messages-issuegen.service -> /usr/lib/systemd/system/console-login-helper-messages-issuegen.service lrwxrwxrwx. 1 root root 47 May 20 23:06 coreos-growpart.service -> /usr/lib/systemd/system/coreos-growpart.service lrwxrwxrwx. 1 root root 69 May 20 23:06 coreos-regenerate-iscsi-initiatorname.service -> /usr/lib/systemd/system/coreos-regenerate-iscsi-initiatorname.service lrwxrwxrwx. 1 root root 67 May 20 23:06 coreos-root-bash-profile-workaround.service -> /usr/lib/systemd/system/coreos-root-bash-profile-workaround.service lrwxrwxrwx. 1 root root 51 May 20 23:06 coreos-useradd-core.service -> /usr/lib/systemd/system/coreos-useradd-core.service lrwxrwxrwx. 1 root root 36 May 20 23:06 crio.service -> /usr/lib/systemd/system/crio.service lrwxrwxrwx. 1 root root 59 May 20 23:06 ignition-firstboot-complete.service -> /usr/lib/systemd/system/ignition-firstboot-complete.service lrwxrwxrwx. 1 root root 42 May 20 23:06 irqbalance.service -> /usr/lib/systemd/system/irqbalance.service lrwxrwxrwx. 1 root root 41 May 20 23:06 mdmonitor.service -> /usr/lib/systemd/system/mdmonitor.service lrwxrwxrwx. 1 root root 46 May 20 23:06 NetworkManager.service -> /usr/lib/systemd/system/NetworkManager.service lrwxrwxrwx. 1 root root 51 Jun 17 17:24 ostree-finalize-staged.path -> /usr/lib/systemd/system/ostree-finalize-staged.path lrwxrwxrwx. 1 root root 37 May 20 23:06 pivot.service -> /usr/lib/systemd/system/pivot.service lrwxrwxrwx. 1 root root 48 May 20 23:06 remote-cryptsetup.target -> /usr/lib/systemd/system/remote-cryptsetup.target lrwxrwxrwx. 1 root root 40 May 20 23:06 remote-fs.target -> /usr/lib/systemd/system/remote-fs.target lrwxrwxrwx. 1 root root 53 May 20 23:06 rpm-ostree-bootstatus.service -> /usr/lib/systemd/system/rpm-ostree-bootstatus.service lrwxrwxrwx. 1 root root 36 May 20 23:06 sshd.service -> /usr/lib/systemd/system/sshd.service lrwxrwxrwx. 1 root root 40 May 20 23:06 vmtoolsd.service -> /usr/lib/systemd/system/vmtoolsd.service



Running `systemctl start kubelet.service` it does kick the whole process and at the end the node joins the cluster.

# References

<!--
Are there any other GitHub issues (open or closed) or Pull Requests that should be linked here? For example:
- #6017
-->

- enter text here.
DanyC97 commented 5 years ago

/cc @cgwalters in case you have some thoughts from RHCOS/ MCO side.

abhinavdahiya commented 5 years ago

are you sure you approved the CSR for your machine https://docs.openshift.com/container-platform/4.1/installing/installing_vsphere/installing-vsphere.html#installation-approve-csrs_installing-vsphere

abhinavdahiya commented 5 years ago

was passed which then redirected to the custom dani-k8s-node-2.ign file which had injected the above snippets. That triggered a reboot between the ignition apply steps

what do you mean by rebooted between ignition apply steps.. ignition doesn't require reboot.

DanyC97 commented 5 years ago

@abhinavdahiya thank you for taking the time to respond, much appreciated !

are you sure you approved the CSR for your machine https://docs.openshift.com/container-platform/4.1/installing/installing_vsphere/installing-vsphere.html#installation-approve-csrs_installing-vsphere

i never needed to approve any CSR, what i saw on previous deployments (for control and compute nodes) was that the CSRs were auto approved.

I'm not sure if there are two paths (if there are please help me understand how it works) here in v4 w.r.t approving the CSRs however all i can say is that looking in the cluster i have running (where the above node didn't join the cluster) i see

oc get -n openshift-cluster-machine-approver all
NAME                                    READY   STATUS    RESTARTS   AGE
pod/machine-approver-7cd7f97455-g9x5q   1/1     Running   0          11d

NAME                               READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/machine-approver   1/1     1            1           11d

NAME                                          DESIRED   CURRENT   READY   AGE
replicaset.apps/machine-approver-7cd7f97455   1         1         1       11d

which is coming from cluster-machine-approver and i can see in the pod's log traces like

I0624 20:31:13.706332       1 main.go:107] CSR csr-5tpq5 added
I0624 20:31:13.708184       1 main.go:132] CSR csr-5tpq5 not authorized: Invalid request
I0624 20:31:13.708206       1 main.go:164] Error syncing csr csr-5tpq5: Invalid request
I0624 20:31:13.748375       1 main.go:107] CSR csr-5tpq5 added
I0624 20:31:13.750478       1 main.go:132] CSR csr-5tpq5 not authorized: Invalid request
I0624 20:31:13.750496       1 main.go:164] Error syncing csr csr-5tpq5: Invalid request
I0624 20:31:13.830649       1 main.go:107] CSR csr-5tpq5 added
I0624 20:31:13.833242       1 main.go:132] CSR csr-5tpq5 not authorized: Invalid request
E0624 20:31:13.833321       1 main.go:174] Invalid request
I0624 20:31:13.833366       1 main.go:175] Dropping CSR "csr-5tpq5" out of the queue: Invalid request

Saying that i am not sure if this is part of the machine-api-operator as i haven't worked out how to trace backwards a deployment -> operator (any hints much appreciated ;) ) ..but i guess it is ..

Update

see attached the whole machine-approver-7cd7f97455-g9x5q pod's log in case you might find s'thing ...

pod-machine-approver-7cd7f97455-g9x5q.log

DanyC97 commented 5 years ago

was passed which then redirected to the custom dani-k8s-node-2.ign file which had injected the above snippets. That triggered a reboot between the ignition apply steps

what do you mean by rebooted between ignition apply steps.. ignition doesn't require reboot.

please see the dani-k8s-node-2_journalctl.log attached. I've added some bookmarks ### dani ### so you can see why i assumed the above sequence triggered the reboot.

DanyC97 commented 5 years ago

@abhinavdahiya any thoughts? or maybe @staebler might be able to chime in? any addition info please let me know, i'm keeping the lab env up in case more info is needed, hopefully will help you out

DanyC97 commented 5 years ago

@cgwalters @abhinavdahiya can anyone of you please help understand the design/ behavior of

rphillips commented 5 years ago

@DanyC97 the controller-manager within the openshift-controller-manager namespaces actually does the issuing of the certificate. The logs there might help.

staebler commented 5 years ago

@DanyC97 The cluster-machine-approver will only approve machines that are added via a machine resource. The vSphere platform does not use machine resources yet. So, as a user, you must manually approve CSR requests for new nodes. As a convenience, the bootstrap machine will auto-approve CSRs requests for nodes while it is running. However, that should not be relied upon.

DanyC97 commented 5 years ago

@DanyC97 The cluster-machine-approver will only approve machines that are added via a machine resource. The vSphere platform does not use machine resources yet. So, as a user, you must manually approve CSR requests for new nodes.

oh so there are 2 paths, many many thanks for sharing this info @staebler !

As a convenience, the bootstrap machine will auto-approve CSRs requests for nodes while it is running. However, that should not be relied upon.

right, so i'll try a test of switching off the bootstrap and see if the CSRs requests are left in pending state, that should confirm.

DanyC97 commented 5 years ago

@DanyC97 the controller-manager within the openshift-controller-manager namespaces actually does the issuing of the certificate. The logs there might help.

i'll check @rphillips , thanks for the info! i'll be curious to understand the whole flow as things don't add up (yet) in my head:

DanyC97 commented 5 years ago

@DanyC97 the controller-manager within the openshift-controller-manager namespaces actually does the issuing of the certificate. The logs there might help.

i'll check @rphillips , thanks for the info! i'll be curious to understand the whole flow as things don't add up (yet) in my head:

* if _vSphere platform does not use machine resources yet_ then no _cluster-machine-approver_ - OK

* if no _cluster-machine-approver_ then let's say we fallback to bootstrap node - TBC

* so then where _controller-manager within the openshift-controller-manager_ fits in the whole picture? cause is not part of the bootstrap node, isn't it? -> will find out

sadly @rphillips the only output i see in the pods running in openshift-controller-manager ns is

W0629 08:33:30.961010       1 reflector.go:256] k8s.io/client-go/informers/factory.go:132: watch of *v1.ConfigMap ended with: too old resource version: 4962577 (4963900)
W0629 08:33:49.357787       1 reflector.go:256] github.com/openshift/client-go/template/informers/externalversions/factory.go:101: watch of *v1.TemplateInstance ended with: The resourceVersion for the provided watch is too old.
W0629 08:35:23.082164       1 reflector.go:256] github.com/openshift/client-go/route/informers/externalversions/factory.go:101: watch of *v1.Route ended with: The resourceVersion for the provided watch is too old.
W0629 08:36:01.201731       1 reflector.go:256] github.com/openshift/client-go/build/informers/externalversions/factory.go:101: watch of *v1.Build ended with: The resourceVersion for the provided watch is too old.
W0629 08:36:21.794776       1 reflector.go:256] github.com/openshift/client-go/apps/informers/externalversions/factory.go:101: watch of *v1.DeploymentConfig ended with: The resourceVersion for the provided watch is too old.
W0629 08:37:07.652800       1 reflector.go:256] github.com/openshift/client-go/image/informers/externalversions/factory.go:101: watch of *v1.ImageStream ended with: The resourceVersion for the provided watch is too old.
W0629 08:38:49.543850       1 reflector.go:256] github.com/openshift/origin/pkg/unidling/controller/unidling_controller.go:199: watch of *v1.Event ended with: The resourceVersion for the provided watch is too old.
W0629 08:41:06.451980       1 reflector.go:256] github.com/openshift/client-go/template/informers/externalversions/factory.go:101: watch of *v1.TemplateInstance ended with: The resourceVersion for the provided watch is too old.
W0629 08:41:17.966140       1 reflector.go:256] k8s.io/client-go/informers/factory.go:132: watch of *v1.ConfigMap ended with: too old resource version: 4964042 (4965587)

so not much related to CSRs

DanyC97 commented 5 years ago

@DanyC97 The cluster-machine-approver will only approve machines that are added via a machine resource. The vSphere platform does not use machine resources yet. So, as a user, you must manually approve CSR requests for new nodes.

oh so there are 2 paths, many many thanks for sharing this info @staebler !

As a convenience, the bootstrap machine will auto-approve CSRs requests for nodes while it is running. However, that should not be relied upon.

right, so i'll try a test of switching off the bootstrap and see if the CSRs requests are left in pending state, that should confirm.

@staebler you were spot on Sir !

i've turned off bootstrap node and it behaves as per the docs

[root@dani-dev ~]# oc get csr
NAME        AGE    REQUESTOR                                                                   CONDITION
csr-28p87   3m4s   system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   Pending

which imo this is a docs bug. And for folks who need to know if bootstrap node did approve anything journalctl |grep approve-csr |grep -v "No resources found" will do the job

DanyC97 commented 5 years ago

was passed which then redirected to the custom dani-k8s-node-2.ign file which had injected the above snippets. That triggered a reboot between the ignition apply steps

what do you mean by rebooted between ignition apply steps.. ignition doesn't require reboot.

please see the dani-k8s-node-2_journalctl.log attached. I've added some bookmarks ### dani ### so you can see why i assumed the above sequence triggered the reboot.

@abhinavdahiya if you can point me in the right direction, i'd be happy to keep digging.

staebler commented 5 years ago

which imo this is a docs bug.

@DanyC97 Can you be more specific about how you feel that the docs are deficient?

The following is an excerpt from the docs. It implies to me that the CSRs may be approved automatically or may need to be approved manually. It is the responsibility of the user to verify that the CSRs are approved--whether automatically or manually.

When you add machines to a cluster, two pending certificates signing request (CSRs) are generated for each machine that you added. You must confirm that these CSRs are approved or, if necessary, approve them yourself.

DanyC97 commented 5 years ago

which imo this is a docs bug.

@DanyC97 Can you be more specific about how you feel that the docs are deficient?

The following is an excerpt from the docs. It implies to me that the CSRs may be approved automatically or may need to be approved manually. It is the responsibility of the user to verify that the CSRs are approved--whether automatically or manually.

When you add machines to a cluster, two pending certificates signing request (CSRs) are generated for each machine that you added. You must confirm that these CSRs are approved or, if necessary, approve them yourself.

sure, sorry i wasn't clear enough.

Indeed you are, the text implies that however You must confirm that these CSRs are approved -> if you trying to double check if they were approved, you won't be able to do so w/o a bit more info becasue:

Imo having a section - a small note and/ or paragraph to mention what you taught me here is miles better, hence me saying a bug

Also in the docs it says on step 1

Confirm that the cluster recognizes the machines:

but you can't confirm that since there are no nodes in NotReady state, they will appear if kubelet service is running .. however if you are unlucky like me then the docs won't help much.

Maybe is a section for troubleshooting, either way i think a clue can be added to help folks.

Update

sorry if i was too strong on the docs claiming is a bug, is maybe an enhancement

cgwalters commented 5 years ago

Ignition should have enabled kubelet.service, it's part of the Ignition generated by the MCO.

DanyC97 commented 5 years ago

was passed which then redirected to the custom dani-k8s-node-2.ign file which had injected the above snippets. That triggered a reboot between the ignition apply steps

what do you mean by rebooted between ignition apply steps.. ignition doesn't require reboot.

please see the dani-k8s-node-2_journalctl.log attached. I've added some bookmarks ### dani ### so you can see why i assumed the above sequence triggered the reboot.

@abhinavdahiya if you can point me in the right direction, i'd be happy to keep digging.

right after keep spinning new nodes, i found out that the dependent service

 cat /etc/systemd/system/kubelet.service
[Unit]
Description=Kubernetes Kubelet
Wants=rpc-statd.service

hasn't started ...hmmm

systemctl status rpc-statd
● rpc-statd.service - NFS status monitor for NFSv2/3 locking.
   Loaded: loaded (/usr/lib/systemd/system/rpc-statd.service; static; vendor preset: disabled)
   Active: inactive (dead)
[root@localhost ~]# cat /usr/lib/systemd/system/rpc-statd.service
[Unit]
Description=NFS status monitor for NFSv2/3 locking.
DefaultDependencies=no
Conflicts=umount.target
Requires=nss-lookup.target rpcbind.socket
Wants=network-online.target
After=network-online.target nss-lookup.target rpcbind.socket

PartOf=nfs-utils.service

[Service]
Environment=RPC_STATD_NO_NOTIFY=1
Type=forking
PIDFile=/var/run/rpc.statd.pid
ExecStart=/usr/sbin/rpc.statd
DanyC97 commented 5 years ago

and the dependencies for rpc-statd is

systemctl list-dependencies rpc-statd
rpc-statd.service
● ├─rpcbind.socket
● ├─system.slice
● ├─network-online.target
● │ └─NetworkManager-wait-online.service
● └─nss-lookup.target

where rpcbind.service is not running. And the output of the network-online.target is up and happy

systemctl status network-online.target
● network-online.target - Network is Online
   Loaded: loaded (/usr/lib/systemd/system/network-online.target; static; vendor preset: disabled)
   Active: active since Tue 2019-07-02 12:29:55 UTC; 18min ago
     Docs: man:systemd.special(7)
           https://www.freedesktop.org/wiki/Software/systemd/NetworkTarget

Jul 02 12:29:55 dani-k8s-node-2.dani.local systemd[1]: Reached target Network is Online.
DanyC97 commented 5 years ago

updating in case someone else bumps into this issue.

A) the initial issue was around the fact that nodes we failing to join the cluster because the kubelet.service

After a lot of digging (few dead ends and lots of bunny hop) it turned out the problem was caused by my DNS setup, in particular my nodes fqdn was not withing the subdomain where the api and api-int were

E.g

NOK

B) the second issue (which was more like a question for my own knowledge) was around the fact that the CSRs were approved. With @staebler 's help i've understood there are 2 paths:

cgwalters commented 5 years ago

if bootstrap node is still up, the CSRs are auto aproved

Probably what we should have done is only have the bootstrap approve CSRs for masters only or so...that's all we actually need for installs, and having it do workers too adds confusion.

DanyC97 commented 5 years ago

@cgwalters thinking loud i think is okay to have both use-cases, especially when you build a 100 nodes cluster, you don't need a human to accept the CSRs nor keep running/ watching using your favor tool/ script for pending CSRs.

A note in the docs imo will be exactly what folks need:

staebler commented 5 years ago

I do not think that we want to advise leaving the bootstrap node running any longer than necessary.

fdammeke commented 3 years ago

@DanyC97 could you elaborate your findings regarding DNS a bit more? I'm bumping in the same issue as you describe when adding a node to an existing cluster. The initial provisioned cluster doesn't have fqdn hostnames within the api / api-int endpoint url either, so I'm curious how this is related?

What comes to my attention is the fact there are no symbolic links for kubelet or machine-config-daemon in /etc/systemd/system/multi-user.target.wants/ on the extra worker nodes, but they are present on the initial installation worker nodes. Any thoughts?