metal3-io / metal3-dev-env

Metal³ Development Environment
Apache License 2.0
114 stars 118 forks source link

CAPM3 do not exist(The worker is not created) #572

Closed Mawwlle closed 3 years ago

Mawwlle commented 3 years ago

System: 6c CPUs 20gb ram CentOS 8(default metal3 architecture);
We need to deploy a standard metal environment.
image
However, when creating a cluster, a working node is not created. If you run this command, then "capm3 do not exist in namespaces"
image
What could be the reason?

fmuyassarov commented 3 years ago

The namespace of the controllers has changed since then. Sorry that it wasn't updated yet. We are working right now on updating our docs.

So, the right namespace would be capm3-system. See the example output :

$ kubectl get pods -A
NAMESPACE                           NAME                                                             READY   STATUS    RESTARTS   AGE
capi-kubeadm-bootstrap-system       capi-kubeadm-bootstrap-controller-manager-6b6579d56d-7l4pl       2/2     Running   0          3d1h
capi-kubeadm-control-plane-system   capi-kubeadm-control-plane-controller-manager-6d878bb599-r77cc   2/2     Running   0          3d1h
capi-system                         capi-controller-manager-7ff4999d6c-jmjk9                         2/2     Running   0          3d1h
capi-webhook-system                 capi-controller-manager-6c48f8f9bb-n84dz                         2/2     Running   0          3d1h
capi-webhook-system                 capi-kubeadm-bootstrap-controller-manager-56f98bc7f9-6xjgf       2/2     Running   0          3d1h
capi-webhook-system                 capi-kubeadm-control-plane-controller-manager-85bcfd7fcd-k4j59   2/2     Running   0          3d1h
capi-webhook-system                 capm3-controller-manager-7695bc4f6d-lr667                        2/2     Running   0          3d1h
capi-webhook-system                 capm3-ipam-controller-manager-6b6f6d44d7-gxks4                   2/2     Running   0          3d1h
capm3-system                        capm3-baremetal-operator-controller-manager-844bc955dc-k6f8w     2/2     Running   0          3d1h
capm3-system                        capm3-controller-manager-75c7b8fcc8-ngc2f                        2/2     Running   0          3d1h
capm3-system                        capm3-ipam-controller-manager-77d89bfc98-bnsgg                   2/2     Running   0          3d1h
cert-manager                        cert-manager-cainjector-fc6c787db-bwt9w                          1/1     Running   0          3d1h
cert-manager                        cert-manager-d994d94d7-gfn2d                                     1/1     Running   0          3d1h
cert-manager                        cert-manager-webhook-845d9df8bf-q2dqw                            1/1     Running   0          3d1h
kube-system                         coredns-f9fd979d6-fm9ss                                          1/1     Running   0          3d1h
kube-system                         coredns-f9fd979d6-wnxj2                                          1/1     Running   0          3d1h
kube-system                         etcd-kind-control-plane                                          1/1     Running   0          3d1h
kube-system                         kindnet-qgwbc                                                    1/1     Running   0          3d1h
kube-system                         kube-apiserver-kind-control-plane                                1/1     Running   0          3d1h
kube-system                         kube-controller-manager-kind-control-plane                       1/1     Running   0          3d1h
kube-system                         kube-proxy-9c2nh                                                 1/1     Running   0          3d1h
kube-system                         kube-scheduler-kind-control-plane                                1/1     Running   0          3d1h
local-path-storage                  local-path-provisioner-78776bfc44-wk8vn                          1/1     Running   0          3d1h

Can you please paste the output of kubectl get machine -A ?

Mawwlle commented 3 years ago

kubectl get machine -A

image However, after the "kubectl get baremetalhosts -n metal3" command, it was expected
image
But we get image P.S while I write this worker node started working
image
But now I have two questions:
1. Is it okay for me to ssh after this message "metal3@192.168.111.249: Permission denied (publickey, gssapi-keyex, gssapi-with-mic)"?
2. Could these problems be due to Docker's new policy? If so, do you know how this can be configured?

fmuyassarov commented 3 years ago

kubectl get machine -A

image However, after the "kubectl get baremetalhosts -n metal3" command, it was expected

image But we get image P.S while I write this worker node started working

image But now I have two questions:

  1. Is it okay for me to ssh after this message "metal3@192.168.111.249: Permission denied (publickey, gssapi-keyex, gssapi-with-mic)"?
  2. Could these problems be due to Docker's new policy? If so, do you know how this can be configured?

I think you can't ssh into the node until it is not provisioned. Can you please check if you have any issues in the console output? sudo virsh console node_0

Mawwlle commented 3 years ago

kubectl get machine -A

image However, after the "kubectl get baremetalhosts -n metal3" command, it was expected image But we get image P.S while I write this worker node started working image But now I have two questions:

  1. Is it okay for me to ssh after this message "metal3@192.168.111.249: Permission denied (publickey, gssapi-keyex, gssapi-with-mic)"?
  2. Could these problems be due to Docker's new policy? If so, do you know how this can be configured?

I think you can't ssh into the node until it is not provisioned. Can you please check if you have any issues in the console output? sudo virsh console node_0


Log out after sudo virsh console node_0 log.txt

test1-7fcn2 login: What do you need to enter in this field?(Appeared during command execution, I entered a random name and everything froze for me)
image
After that my nodes were suspended, how can I start them again?

fmuyassarov commented 3 years ago

By default, we don't set a username & password for target nodes in Metal3-dev-env scripts. It should work with ssh, because your host's ssh key will be injected to the target nodes. It seems there are some other issues.

What is the current PROVISIONING_STATUS of your BareMetalHosts? if not Provisioned,

  1. can you check Baremetal Operator (pod) logs?
  2. can you check Ironic nodes status? For that you need first to export CONTAINER_RUNTIME=podman if you running on CentOS. Then baremetal node list
fmuyassarov commented 3 years ago

btw, thanks for the logs. But I didn't see anything in there that would indicate the reason for your issue.

Mawwlle commented 3 years ago

By default, we don't set a username & password for target nodes in Metal3-dev-env scripts. It should work with ssh, because your host's ssh key will be injected to the target nodes. It seems there are some other issues.

What is the current PROVISIONING_STATUS of your BareMetalHosts? if not Provisioned,

  1. can you check Baremetal Operator (pod) logs?
  2. can you check Ironic nodes status? For that you need first to export CONTAINER_RUNTIME=podman if you running on CentOS. Then baremetal node list

I attach as many different logs as possible so that you have more information, thanks for you responsiveness log(pod).txt bm_node_list.txt-Provisioning State "wait call-back" lasts a very long time some_commands_inf.txt Also a common mistake is the lack of free space on the device. We have 200 gigabytes of storage, as I understand it, this is running out of space in inodes. How fix it?

fmuyassarov commented 3 years ago

Thank for the logs! Pasting some outputs here for visibility.

$ kubectl get pods -A
NAMESPACE                           NAME                                                             READY   STATUS    RESTARTS   AGE
capi-kubeadm-bootstrap-system       capi-kubeadm-bootstrap-controller-manager-7ffb7c9d77-rc8sf       2/2     Running   0          31m
capi-kubeadm-control-plane-system   capi-kubeadm-control-plane-controller-manager-5b8cf46bb6-8dmpq   2/2     Running   0          31m
capi-system                         capi-controller-manager-559db48f6-zhhkn                          2/2     Running   0          31m
capi-webhook-system                 capi-controller-manager-76d9b5889c-c5kpm                         2/2     Running   0          31m
capi-webhook-system                 capi-kubeadm-bootstrap-controller-manager-787cb85f58-lqfxd       2/2     Running   0          31m
capi-webhook-system                 capi-kubeadm-control-plane-controller-manager-86c44777c5-blwnm   2/2     Running   0          31m
capi-webhook-system                 capm3-controller-manager-7695bc4f6d-s2mgp                        2/2     Running   0          31m
capi-webhook-system                 capm3-ipam-controller-manager-6b6f6d44d7-bstlr                   2/2     Running   0          31m
capm3-system                        capm3-baremetal-operator-controller-manager-844bc955dc-mt4h6     2/2     Running   0          31m
capm3-system                        capm3-controller-manager-75c7b8fcc8-s6zpg                        2/2     Running   0          31m
capm3-system                        capm3-ipam-controller-manager-77d89bfc98-p7fss                   2/2     Running   0          31m
cert-manager                        cert-manager-cainjector-fc6c787db-qh9rw                          1/1     Running   0          31m
cert-manager                        cert-manager-d994d94d7-b8p6k                                     1/1     Running   0          31m
cert-manager                        cert-manager-webhook-845d9df8bf-rrmgm                            1/1     Running   0          31m
kube-system                         coredns-f9fd979d6-qj6mn                                          1/1     Running   0          33m
kube-system                         etcd-minikube                                                    1/1     Running   1          30m
kube-system                         kube-apiserver-minikube                                          1/1     Running   1          30m
kube-system                         kube-controller-manager-minikube                                 1/1     Running   1          30m
kube-system                         kube-proxy-mhbhf                                                 1/1     Running   0          33m
kube-system                         kube-scheduler-minikube                                          1/1     Running   1          31m
kube-system                         storage-provisioner                                              1/1     Running   1          33m
metal3                              metal3-ironic-6fbb965956-sgtcx                                   9/9     Running   0          30m
$ baremetal node list
+--------------------------------------+--------+--------------------------------------+-------------+--------------------+-------------+
| UUID                                 | Name   | Instance UUID                        | Power State | Provisioning State | Maintenance |
+--------------------------------------+--------+--------------------------------------+-------------+--------------------+-------------+
| 3c757597-07ae-48ea-af03-fdbf4a07a09d | node-0 | 20ff305c-d488-46ff-a835-8455d375da80 | power on    | active             | False       |
| 46e553ca-5cb0-44af-b3cd-1bb5aab8ad3c | node-1 | af3f80d1-74ec-4458-9059-1e3fed4ba7fe | power on    | wait call-back     | False       |
+--------------------------------------+--------+--------------------------------------+-------------+--------------------+-------------+
$ kubectl get bmh -n metal3
NAME     PROVISIONING_STATUS   CONSUMER                   ONLINE   ERROR
node-0   provisioned           test1-controlplane-hl2kz   true
node-1   provisioning          test1-workers-jgzl7        true
$ kubectl get machine -A
NAMESPACE   NAME                     PROVIDERID                                      PHASE          VERSION
metal3      test1-759cfc77c5-nbnpj                                                   Provisioning   v1.18.8
metal3      test1-d54lk              metal3://20ff305c-d488-46ff-a835-8455d375da80   Running        v1.18.8
$ sudo virsh net-dhcp-leases baremetal
 Expiry Time           MAC address         Protocol   IP address          Hostname   Client ID or DUID
-----------------------------------------------------------------------------------------------------------
 2020-12-09 13:15:11   00:be:62:08:82:10   ipv4       192.168.111.20/24   node-0     01:00:be:62:08:82:10
 2020-12-09 13:12:39   00:be:62:08:82:14   ipv4       192.168.111.21/24   node-1     01:00:be:62:08:82:14
 2020-12-09 13:13:05   52:54:00:1e:22:b4   ipv4       192.168.111.59/24   minikube   01:52:54:00:1e:22:b4

I see some error output in your logs related to the disk space: go: creating work dir: mkdir /tmp/go-build164395946: no space left on device. I assume this is coming from your host where you are running Metal3-dev-env. Can you please check if disk space usage is close to 100% ? In our CI, we are creating VM with 100GB disk space.

Also, I see that Ironic node-1 is in wait-call-back, which indicates that Ironic conductor is waiting for ramdisk to boot. See the Ironic state machine: https://docs.openstack.org/ironic/rocky/contributor/states.html

My assumption is that, lack of space in your host. But that would be more clear if you could check the host disk space.

Mawwlle commented 3 years ago

Thanks, it helped!