metal3-io / baremetal-operator

Bare metal host provisioning integration for Kubernetes
Apache License 2.0
550 stars 241 forks source link

Baremetal operator fails to register hosts that have been discovered by ironic #323

Closed Deepthidharwar closed 4 years ago

Deepthidharwar commented 4 years ago

I have been trying to add BareMetalHost's worker node to a cluster running 3 masters and 1 baremetal worker node. The Baremetal host was powered on when I tried to add to the cluster. Ironic Inspector is failing and node stuck is in Inspecting state.

Logs

2019-10-17 14:09:37.946 1 ERROR ironic_inspector.utils [-] [node: MAC 3c:fd:fe:c1:6b:61] Port 3c:fd:fe:c1:6b:61 already exists, uuid: 2998948b-2ab4-4a0f-94be-aa8c1d7fc950: ironic_inspector.utils.NotFoundInCacheError: Could not find a node for attributes {'bmc_address': [], 'mac': ['3c:fd:fe:c1:6b:61', '24:6e:96:c3:ab:36', '24:6e:96:c3:ab:34', '3c:fd:fe:c1:6b:60', '24:6e:96:c3:ab:37', '24:6e:96:c3:ab:35']}^[[00m
2019-10-17 14:09:37.951 1 INFO ironic_inspector.process [-] [node: MAC 3c:fd:fe:c1:6b:61] Ramdisk logs were stored in file unknown_20191017-140937.947461.tar.gz^[[00m
2019-10-17 14:09:37.951 1 ERROR ironic_inspector.utils [-] [node: MAC 3c:fd:fe:c1:6b:61] The following failures happened during running pre-processing hooks:
Node not found hook failed: Port 3c:fd:fe:c1:6b:61 already exists, uuid: 2998948b-2ab4-4a0f-94be-aa8c1d7fc950^[[00m
2019-10-17 14:09:37.952 1 DEBUG oslo_messaging.rpc.server [-] Expected exception during message handling () _process_incoming /usr/lib/python3.6/site-packages/oslo_messaging/rpc/server.py:168^[[00m
2019-10-17 14:09:37.953 1 DEBUG futurist.periodics [-] Submitting periodic callback 'ironic_inspector.pxe_filter.base.BaseFilter.get_periodic_sync_task.<locals>.periodic_sync_task' _process_scheduled /usr/lib/python3.6/site-packages/futurist/periodics.py:639^[[00m
2019-10-17 14:09:37.955 1 DEBUG ironic_inspector.main [req-2c7d5da9-fdd9-4e36-a8cb-4830726aa7e7 - - - - -] Returning error to client: The following failures happened during running pre-processing hooks:
Node not found hook failed: Port 3c:fd:fe:c1:6b:61 already exists, uuid: 2998948b-2ab4-4a0f-94be-aa8c1d7fc950 error_response /usr/lib/python3.6/site-packages/ironic_inspector/main.py:122^[[00m
2019-10-17 14:09:37.957 1 INFO eventlet.wsgi.server [req-2c7d5da9-fdd9-4e36-a8cb-4830726aa7e7 - - - - -] 172.22.0.73 "POST /v1/continue HTTP/1.1" status: 400  len: 457 time: 0.1560910^[[00m
2019-10-17 14:09:52.958 1 DEBUG futurist.periodics [-] Submitting periodic callback 'ironic_inspector.pxe_filter.base.BaseFilter.get_periodic_sync_task.<locals>.periodic_sync_task' _process_scheduled /usr/lib/python3.6/site-packages/futurist/periodics.py:639^[[00m 
oc get -o yaml baremetalhosts openshift-e24-h11-new -n openshift-machine-api
apiVersion: metal3.io/v1alpha1
kind: BareMetalHost
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"metal3.io/v1alpha1","kind":"BareMetalHost","metadata":{"annotations":{},"name":"openshift-e24-h11-new","namespace":"openshift-machine-api"},"spec":{"bmc":{"address":"ipmi://10.19.96.69","credentialsName":"openshift-e24-h11-bmc-secret"},"bootMACAddress":"3c:fd:fe:c1:6b:61","online":true}}
  creationTimestamp: "2019-10-17T12:55:55Z"
  finalizers:
  - baremetalhost.metal3.io
  generation: 1
  name: openshift-e24-h11-new
  namespace: openshift-machine-api
  resourceVersion: "504805"
  selfLink: /apis/metal3.io/v1alpha1/namespaces/openshift-machine-api/baremetalhosts/openshift-e24-h11-new
  uid: b6e5a3ef-d68d-4391-9022-40c22c64d92f
spec:
  bmc:
    address: ipmi://10.19.96.69
    credentialsName: openshift-e24-h11-bmc-secret
  bootMACAddress: 3c:fd:fe:c1:6b:61
  online: true
status:
  errorMessage: ""
  goodCredentials:
    credentials:
      name: openshift-e24-h11-bmc-secret
      namespace: openshift-machine-api
    credentialsVersion: "487868"
  hardwareProfile: ""
  lastUpdated: "2019-10-17T13:56:00Z"
  operationalStatus: OK
  poweredOn: false
  provisioning:
    ID: 219d0c20-e32b-419b-aacd-e1a07c244659
    image:
      checksum: ""
      url: ""
    state: inspecting
  triedCredentials:
    credentials:
      name: openshift-e24-h11-bmc-secret
      namespace: openshift-machine-api
    credentialsVersion: "487868"

openstack baremetal node list
+--------------------------------------+-----------------------+--------------------------------------+-------------+--------------------+-------------+
| UUID                                 | Name                  | Instance UUID                        | Power State | Provisioning State | Maintenance |
+--------------------------------------+-----------------------+--------------------------------------+-------------+--------------------+-------------+
| 7a93b001-fd54-4344-8e66-cbd29e6f682c | openshift-e24-h21     | 7a93b001-fd54-4344-8e66-cbd29e6f682c | power on    | active             | False       |
| 7253396b-eb8f-4f45-aa28-38571782af8e | None                  | None                                 | None        | enroll             | False       |
| 219d0c20-e32b-419b-aacd-e1a07c244659 | openshift-e24-h11-new | None                                 | power on    | inspect failed     | False       |
+--------------------------------------+-----------------------+--------------------------------------+-------------+--------------------+-------------+
[root@e24]# openstack baremetal port list
+--------------------------------------+-------------------+
| UUID                                 | Address           |
+--------------------------------------+-------------------+
| 0e646f14-304a-495c-a201-1119248185f2 | 3c:fd:fe:c1:6c:c1 |
| 2998948b-2ab4-4a0f-94be-aa8c1d7fc950 | 3c:fd:fe:c1:6b:61 |
+--------------------------------------+-------------------+
[root@e24]#

openstack baremetal port show 2998948b-2ab4-4a0f-94be-aa8c1d7fc950
+-----------------------+--------------------------------------+
| Field                 | Value                                |
+-----------------------+--------------------------------------+
| address               | 3c:fd:fe:c1:6b:61                    |
| created_at            | 2019-10-17T10:26:27+00:00            |
| extra                 | {}                                   |
| internal_info         | {}                                   |
| is_smartnic           | False                                |
| local_link_connection | {}                                   |
| node_uuid             | 7253396b-eb8f-4f45-aa28-38571782af8e |
| physical_network      | None                                 |
| portgroup_uuid        | None                                 |
| pxe_enabled           | True                                 |
| updated_at            | None                                 |
| uuid                  | 2998948b-2ab4-4a0f-94be-aa8c1d7fc950 |
+-----------------------+--------------------------------------+
dhellmann commented 4 years ago

The problem here is that Ironic registers the host as part of discovery, but then the operator also tries to register it. We should look for an existing host with the same MAC in the port and reuse that host, updating it with name, credentials, and other settings that we don't get through discovery.

metal3-io-bot commented 4 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues will close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

/lifecycle stale

metal3-io-bot commented 4 years ago

Stale issues close after 30d of inactivity. Reopen the issue with /reopen. Mark the issue as fresh with /remove-lifecycle stale.

/close

metal3-io-bot commented 4 years ago

@metal3-io-bot: Closing this issue.

In response to [this](https://github.com/metal3-io/baremetal-operator/issues/323#issuecomment-628690049): >Stale issues close after 30d of inactivity. Reopen the issue with `/reopen`. Mark the issue as fresh with `/remove-lifecycle stale`. > >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.