metal3-io / baremetal-operator

Bare metal host provisioning integration for Kubernetes
Apache License 2.0
568 stars 247 forks source link

BareMetalHost stuck at inspecting phase using both ipmi and idrac #375

Closed sachinphogat closed 4 years ago

sachinphogat commented 4 years ago

I have one Dell PowerEdge M610 server with iDrac6 and ipmi support. I tried applying below yamls for both ipmi and idrac, but both stucks at inspecting phase and finally timeout.


apiVersion: v1 kind: Secret metadata: name: baremetalhost-secret type: Opaque data: password: cGFzc3dvcmQ= username: cm9vdA==


apiVersion: metal3.io/v1alpha1 kind: BareMetalHost metadata: name: baremetalhost spec: bmc: address: ipmi://10.80.146.36:623 credentialsName: baremetalhost-secret online: true

Logs from ironic container:

2019-12-24 11:21:14.242 62 DEBUG ironic.common.json_rpc.client [req-df0e058f-b44b-4ef5-b0e7-324466737708 - - - - -] RPC validate_driver_interfaces returned {"jsonrpc":"2.0","result":{"management":{"result":true},"console":{"reason":"Driver ipmi does not support console (disabled or not implemented).","result":false},"network":{"result":true},"power":{"result":true},"deploy":{"reason":"Node a2287ad2-acfd-4ed7-a661-ea8e1bc79e36 does not have any port associated with it.","result":false},"boot":{"reason":"Node a2287ad2-acfd-4ed7-a661-ea8e1bc79e36 does not have any port associated with it.","result":false},"inspect":{"result":true},"storage":{"result":true},"bios":{"reason":"Driver ipmi does not support bios (disabled or not implemented).","result":false},"raid":{"reason":"Driver ipmi does not support raid (disabled or not implemented).","result":false},"rescue":{"reason":"Driver ipmi does not support rescue (disabled or not implemented).","result":false}},"id":"req-df0e058f-b44b-4ef5-b0e7-324466737708"} _request /usr/lib/python2.7/site-packages/ironic/common/json_rpc/client.py:159
2019-12-24 11:21:14.244 62 INFO eventlet.wsgi.server [req-df0e058f-b44b-4ef5-b0e7-324466737708 - - - - -] ::ffff:10.71.3.13 "GET /v1/nodes/a2287ad2-acfd-4ed7-a661-ea8e1bc79e36/validate HTTP/1.1" status: 200  len: 1154 time: 0.0767150

For idrac:

---
apiVersion: v1
kind: Secret
metadata:
  name: baremetalhost-secret
type: Opaque
data:
  password: cGFzc3dvcmQ=
  username: cm9vdA==

---
apiVersion: metal3.io/v1alpha1
kind: BareMetalHost
metadata:
  name: baremetalhost
spec:
  bmc:
    address: idrac://10.80.146.36:443
    credentialsName: baremetalhost-secret
  online: true

Logs from ironic container:

2019-12-24 11:34:37.473 34 DEBUG dracclient.wsman [req-9d8989b5-a5a8-4427-a850-3713ad5bb625 - - - - -] Received response from https://10.80.146.36:443/wsman: <s:Envelope xmlns:s="http://www.w3.org/2003/05/soap-envelope" xmlns:wsa="http://schemas.xmlsoap.org/ws/2004/08/addressing" xmlns:wxf="http://schemas.xmlsoap.org/ws/2004/09/transfer"><s:Header><wsa:To>http://schemas.xmlsoap.org/ws/2004/08/addressing/role/anonymous</wsa:To><wsa:Action>http://schemas.dell.com/wbem/wscim/1/cim-schema/2/DCIM_LCService/GetRemoteServicesAPIStatusResponse</wsa:Action><wsa:RelatesTo>uuid:0bcf8066-c28e-4300-8010-def49de9ff80</wsa:RelatesTo><wsa:From><wsa:Address>https://10.80.146.36:443/wsman</wsa:Address></wsa:From><wsa:MessageID>uuid:0bcf8066-c28e-4300-8010-def49de9ff80</wsa:MessageID></s:Header><s:Body><wsinst:GetRemoteServicesAPIStatus_OUTPUT xmlns:wsinst="http://schemas.dell.com/wbem/wscim/1/cim-schema/2/DCIM_LCService"><wsinst:ReturnValue>0</wsinst:ReturnValue><wsinst:LCStatus>0</wsinst:LCStatus><wsinst:ServerStatus>2</wsinst:ServerStatus><wsinst:Status>0</wsinst:Status><wsinst:MessageID>LC061</wsinst:MessageID><wsinst:Message>Lifecycle Controller Remote Services is ready.</wsinst:Message></wsinst:GetRemoteServicesAPIStatus_OUTPUT></s:Body></s:Envelope> _do_request /usr/lib/python2.7/site-packages/dracclient/wsman.py:130
2019-12-24 11:34:37.474 34 DEBUG dracclient.client [req-9d8989b5-a5a8-4427-a850-3713ad5bb625 - - - - -] The iDRAC is ready wait_until_idrac_is_ready /usr/lib/python2.7/site-packages/dracclient/client.py:1353
2019-12-24 11:34:37.475 34 DEBUG dracclient.wsman [req-9d8989b5-a5a8-4427-a850-3713ad5bb625 - - - - -] Sending request to https://10.80.146.36:443/wsman: <s:Envelope xmlns:wsa="http://schemas.xmlsoap.org/ws/2004/08/addressing" xmlns:s="http://www.w3.org/2003/05/soap-envelope" xmlns:wsman="http://schemas.dmtf.org/wbem/wsman/1/wsman.xsd"><s:Header><wsa:To s:mustUnderstand="true">https://10.80.146.36:443/wsman</wsa:To><wsman:ResourceURI s:mustUnderstand="true">http://schemas.dell.com/wbem/wscim/1/cim-schema/2/DCIM_ComputerSystem</wsman:ResourceURI><wsa:MessageID s:mustUnderstand="true">uuid:c519883a-026d-468c-9a2d-46fa1d9ce13a</wsa:MessageID><wsa:ReplyTo><wsa:Address>http://schemas.xmlsoap.org/ws/2004/08/addressing/role/anonymous</wsa:Address></wsa:ReplyTo><wsa:Action s:mustUnderstand="true">http://schemas.xmlsoap.org/ws/2004/09/enumeration/Enumerate</wsa:Action></s:Header><s:Body><wsen:Enumerate xmlns:wsen="http://schemas.xmlsoap.org/ws/2004/09/enumeration"><wsman:Filter Dialect="http://schemas.dmtf.org/wbem/cql/1/dsp0202.pdf">select EnabledState from DCIM_ComputerSystem</wsman:Filter><wsman:OptimizeEnumeration/><wsman:MaxElements>100</wsman:MaxElements></wsen:Enumerate></s:Body></s:Envelope> _do_request /usr/lib/python2.7/site-packages/dracclient/wsman.py:81

2019-12-24 11:34:38.060 34 DEBUG dracclient.wsman [req-9d8989b5-a5a8-4427-a850-3713ad5bb625 - - - - -] Received response from https://10.80.146.36:443/wsman: <s:Envelope xmlns:s="http://www.w3.org/2003/05/soap-envelope" xmlns:wsa="http://schemas.xmlsoap.org/ws/2004/08/addressing" xmlns:wsen="http://schemas.xmlsoap.org/ws/2004/09/enumeration" xmlns:wsman="http://schemas.dmtf.org/wbem/wsman/1/wsman.xsd" xmlns:wse="http://schemas.xmlsoap.org/ws/2004/08/eventing"><s:Header><wsa:Action>http://schemas.dmtf.org/wbem/wsman/1/wsman/fault</wsa:Action><wsa:To>http://schemas.xmlsoap.org/ws/2004/08/addressing/role/anonymous</wsa:To><wsa:MessageID>uuid:c519883a-026d-468c-9a2d-46fa1d9ce13a</wsa:MessageID><wsa:RelatesTo>uuid:c519883a-026d-468c-9a2d-46fa1d9ce13a</wsa:RelatesTo></s:Header><s:Body><s:Fault><s:Code><s:Value>s:Receiver</s:Value><s:Subcode><s:Value>wsman:InternalError</s:Value></s:Subcode></s:Code><s:Reason><s:Text xml:lang="en">The service cannot comply with the request due to internal processing errors</s:Text></s:Reason><s:Detail><s:Text>Class not found</s:Text></s:Detail></s:Fault></s:Body></s:Envelope> _do_request /usr/lib/python2.7/site-packages/dracclient/wsman.py:130
2019-12-24 11:34:38.063 34 ERROR ironic.conductor.manager [req-9d8989b5-a5a8-4427-a850-3713ad5bb625 - - - - -] Failed to get power state for node 3898e5b7-588a-49d9-b68a-815ae7774d31. Error: 'NoneType' object has no attribute 'text'

Please help me in resolving this issue and let me know if I am doing any wrong. Thanks in Advance.

stbenjam commented 4 years ago

@dtantsur @juliakreger Do you know why based on the logs here why the hosts are stuck in inspecting?

@sachinphogat Can you look at the console of the hosts? Do you happen to know what they were doing at the time? Did they PXE boot successfully?

dtantsur commented 4 years ago

The iDrac failure looks like an issue in the dracclient library :-/

dtantsur commented 4 years ago

Update: it seems that iDRAC 6 won't be supported by the idrac driver. It is unclear what causes the IPMI failure since the log snipped does not have anything of importance.

rpioso commented 4 years ago

The Dell EMC Ironic 3d Party Continuous Integration (CI) test infrastructure presently tests every proposed change to Ironic against 13G and 14G Dell EMC hardware. Those use iDRAC 8 and iDRAC 9, respectively. The testing exercises the IPMI, Redfish, and WS-Man iDRAC APIs.

In the past, that infrastructure tested 12G hardware. The Dell EMC M610 is an 11G system, which as far as I know has never been tested by that CI system. Therefore, it is likely 11G has not been tested by the ironic community.

Looking at the logs above, it is clear the iDRAC's WS-Man API is being used, because its response states it does not recognize the DCIM_ComputerSystem class -- "Class not found." That class is fundamental to the workings of the python-dracclient Python package, which provides WS-Man client functionality to the ironic idrac driver.

May I suggest you try the ironic ipmi driver, if you have not already? Please feel free to share logs from its use.

stbenjam commented 4 years ago

@sachinphogat Were you able to retry with the ipmi driver?

stbenjam commented 4 years ago

One other thing to try is power-off the host, delete all the host data from the Ironic database (restart the metal3 pod), and try registering the host with a boot MAC address while the host is powered off.

stbenjam commented 4 years ago

/close

@sachinphogat Going to close this: please reopen if it's still an issue after trying the suggestions above, thanks!

metal3-io-bot commented 4 years ago

@stbenjam: Closing this issue.

In response to [this](https://github.com/metal3-io/baremetal-operator/issues/375#issuecomment-585228123): >/close > >@sachinphogat Going to close this: please reopen if it's still an issue after trying the suggestions above, thanks! Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.