metal3-io / baremetal-operator

Bare metal host provisioning integration for Kubernetes
Apache License 2.0
567 stars 246 forks source link

BMH CRs have no status #1826

Open fracappa opened 1 month ago

fracappa commented 1 month ago

Steps I followed

Hello everyone.

I'm using the baremetal-operator to manage my on-premise cluster where I have Dell servers, equipped with iDRAC 9.

I deploy the metal3 provider like the following:

clusterctl init --core cluster-api:v1.7.3 \
    --bootstrap kubeadm:v1.7.3 \
    --control-plane kubeadm:v1.7.3 -v5

and then:

clusterctl init --infrastructure metal3

Whereas, I install the baremetal operator in the following way:

git clone https://github.com/metal3-io/baremetal-operator.git
kubectl create namespace baremetal-operator-system
cd baremetal-operator
kustomize build config/default | kubectl apply -f -

as specified in the metal3-provider github repository.

Then, I also deploy the ironic pods after customizing the env var files in this way:

./baremetal-operator/tools/deploy.sh -i.

After the setup, I try to create my BMH resources this way:

apiVersion: v1
kind: Secret
metadata:
  name: bmc-credentials-spring
type: Opaque
data:
  username: <username>
  password: <password>
---
apiVersion: metal3.io/v1alpha1
kind: BareMetalHost
metadata:
  name: spring-bmh
spec:
  online: true
  bootMACAddress: <Boot-MAC-address>
  bootMode: UEFI
  bmc:
    address: idrac-virtualmedia://<iDRAC-IP>:443/redfish/v1/Systems/System.Embedded.1
    credentialsName: bmc-credentials-spring
    disableCertificateVerification: true
  image:
    checksum: http://<image-server>/images/SHA256SUMS
    checksumType: sha256
    format: qcow2
    url:  http://<image-server>/images/noble-server-cloudimg-amd64.img
  userData:
    name: cloud-init-spring

What happened

I expect that the BMH resource would go to the registering state, and then moving to inspecting, provisioning and provisioned.

However, the resource has no STATUS and it looks like this:

NAME         STATUS   STATE   CONSUMER   BMC                                                                            ONLINE   ERROR   AGE
spring-bmh                               idrac-virtualmedia://<IDRAC-IP>:443/redfish/v1/Systems/System.Embedded.1   true             19h

More details

Moreover, when I inspect the logs of the ironic pods I have these:

2024-07-30 07:36:55.039 1 DEBUG ironic.conductor.periodics [-] Completed periodic task for purpose checking if async firmware update failed. wrapper /usr/lib/python3.9/site-packages/ironic/conductor/periodics.py:174
2024-07-30 07:36:55.041 1 DEBUG ironic.conductor.periodics [-] Completed periodic task for purpose checking async firmware update tasks. wrapper /usr/lib/python3.9/site-packages/ironic/conductor/periodics.py:174
2024-07-30 07:37:00.855 1 INFO eventlet.wsgi.server [None req-29d822e7-e71d-475d-a9de-0b7d233838a0 - - - - - -] ::ffff:10.0.0.110 "GET /v1/ HTTP/1.1" status: 200  len: 2909 time: 0.0024757
2024-07-30 07:37:00.856 1 INFO ironic_lib.auth_basic [None req-29d822e7-e71d-475d-a9de-0b7d233838a0 - - - - - -] No authorization token received
2024-07-30 07:37:00.857 1 INFO eventlet.wsgi.server [None req-29d822e7-e71d-475d-a9de-0b7d233838a0 - - - - - -] ::ffff:10.0.0.110 "GET /v1/drivers HTTP/1.1" status: 401  len: 222 time: 0.0007792
2024-07-30 07:37:11.237 1 INFO eventlet.wsgi.server [None req-6276dbaf-36a9-4d6c-9aa7-6866e1f9cd4c - - - - - -] ::ffff:<IP> "GET / HTTP/1.1" status: 200  len: 645 time: 0.0010841
2024-07-30 07:37:11.338 1 INFO eventlet.wsgi.server [None req-49321d2f-5295-4293-abbc-94d1d7f3e2e3 - - - - - -] ::ffff:<IP> "GET / HTTP/1.1" status: 200  len: 645 time: 0.0013549
2024-07-30 07:37:18.286 1 INFO eventlet.wsgi.server [None req-289eb088-7158-4951-991c-4cc5ffc422bc - - - - - -] ::ffff:10.0.0.110 "GET /v1/ HTTP/1.1" status: 200  len: 2909 time: 0.0020258
2024-07-30 07:37:18.287 1 INFO ironic_lib.auth_basic [None req-289eb088-7158-4951-991c-4cc5ffc422bc - - - - - -] No authorization token received
2024-07-30 07:37:18.288 1 INFO eventlet.wsgi.server [None req-289eb088-7158-4951-991c-4cc5ffc422bc - - - - - -] ::ffff:10.0.0.110 "GET /v1/drivers HTTP/1.1" status: 401  len: 222 time: 0.0006793
2024-07-30 07:37:24.954 1 DEBUG futurist.periodics [-] Submitting periodic callback 'ironic.drivers.modules.pxe_base.PXEBaseMixin._check_boot_timeouts' _process_scheduled /usr/lib/python3.9/site-packages/futurist/periodics.py:638
2024-07-30 07:37:24.957 1 DEBUG futurist.periodics [-] Submitting periodic callback 'ironic.drivers.modules.pxe_base.PXEBaseMixin._check_boot_timeouts' _process_scheduled /usr/lib/python3.9/site-packages/futurist/periodics.py:638
2024-07-30 07:37:24.958 1 DEBUG ironic.conductor.periodics [-] Completed periodic task for purpose checking PXE boot status. wrapper /usr/lib/python3.9/site-packages/ironic/conductor/periodics.py:174
2024-07-30 07:37:24.960 1 DEBUG futurist.periodics [-] Submitting periodic callback 'ironic.drivers.modules.pxe_base.PXEBaseMixin._check_boot_timeouts' _process_scheduled /usr/lib/python3.9/site-packages/futurist/periodics.py:638
2024-07-30 07:37:24.961 1 DEBUG ironic.conductor.periodics [-] Completed periodic task for purpose checking PXE boot status. wrapper /usr/lib/python3.9/site-packages/ironic/conductor/periodics.py:174
2024-07-30 07:37:24.964 1 DEBUG ironic.conductor.periodics [-] Completed periodic task for purpose checking PXE boot status. wrapper /usr/lib/python3.9/site-packages/ironic/conductor/periodics.py:174
2024-07-30 07:37:24.966 1 DEBUG futurist.periodics [-] Submitting periodic callback 'ironic.drivers.modules.pxe_base.PXEBaseMixin._check_boot_timeouts' _process_scheduled /usr/lib/python3.9/site-packages/futurist/periodics.py:638
2024-07-30 07:37:24.969 1 DEBUG ironic.conductor.periodics [-] Completed periodic task for purpose checking PXE boot status. wrapper /usr/lib/python3.9/site-packages/ironic/conductor/periodics.py:174
2024-07-30 07:37:30.861 1 INFO eventlet.wsgi.server [None req-6f9147eb-8f68-4c1c-9bee-87739b139f05 - - - - - -] ::ffff:10.0.0.110 "GET /v1/ HTTP/1.1" status: 200  len: 2909 time: 0.0021667
2024-07-30 07:37:30.861 1 INFO ironic_lib.auth_basic [None req-6f9147eb-8f68-4c1c-9bee-87739b139f05 - - - - - -] No authorization token received
2024-07-30 07:37:30.862 1 INFO eventlet.wsgi.server [None req-6f9147eb-8f68-4c1c-9bee-87739b139f05 - - - - - -] ::ffff:10.0.0.110 "GET /v1/drivers HTTP/1.1" status: 401  len: 222 time: 0.0007560

Environment:

/kind bug

metal3-io-bot commented 1 month ago

This issue is currently awaiting triage. If Metal3.io contributors determine this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance. The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.
tuminoid commented 1 month ago

@kashifest @Rozzii PTAL?

Rozzii commented 1 month ago

@fracappa Based on the error message and the fact that you can't see the state of the BMH I would say that it is a "Basic Authentication issue" the deploy script makes sure that BMO and Ironic uses the same credential for the Ironic API but you are deploying BMO without the script and Ironic with the script.

I would recommend deploying BOM and Ironic together like this: ./baremetal-operator/tools/deploy.sh -b -i

AFAIK this is not an Idrac specific issue simply a miss configured BAUTH setup. I will remove the bug label for now.

Rozzii commented 1 month ago

/triage needs-information