openbmc / btbridge

Apache License 2.0
6 stars 9 forks source link

Host shutting down ever other IPL #8

Closed causten closed 8 years ago

causten commented 8 years ago

BTBRIDGED is sending a CC of 0xCE (sequence 7D) which triggers Hostboot to shutdown and the boot to stop.

[BTBRIDGED] Completed request with seq 0x7b, netfn 0x33, lun 0x00, cmd 0xf0, cc 0x00 [BTBRIDGED] Sending dbus signal with seq 0x7c, netfn 0x32, lun 0x00, cmd 0xf0 [BTBRIDGED] Received a dbus response for msg with seq 0x7c [BTBRIDGED] Processed 1 dbus events [BTBRIDGED] Disabling timer, no elements remain in queue [BTBRIDGED] Successfully wrote 7 of 7 bytes to /dev/bt-host [BTBRIDGED] Turning off POLLOUT for the BT in poll() [BTBRIDGED] Completed request with seq 0x7c, netfn 0x33, lun 0x00, cmd 0xf0, cc 0x00 [BTBRIDGED] Sending dbus signal with seq 0x7d, netfn 0x32, lun 0x00, cmd 0xf0 [BTBRIDGED] Timeout on msg with seq: 0x7d [BTBRIDGED] Disabling timer, no elements remain in queue [BTBRIDGED] Successfully wrote 5 of 5 bytes to /dev/bt-host [BTBRIDGED] Turning off POLLOUT for the BT in poll() [BTBRIDGED] Completed request with seq 0x7d, netfn 0x33, lun 0x00, cmd 0xf0, cc 0xce [BTBRIDGED] No message found to send timeout [BTBRIDGED] Failed to find matching request for dbus method with seq: 0x7d [BTBRIDGED] Processed 1 dbus events [BTBRIDGED] Sending dbus signal with seq 0x7e, netfn 0x0a, lun 0x00, cmd 0x44 [BTBRIDGED] Received a dbus response for msg with seq 0x7e [BTBRIDGED] Processed 1 dbus events [BTBRIDGED] Disabling timer, no elements remain in queue [BTBRIDGED] Successfully wrote 7 of 7 bytes to /dev/bt-host [BTBRIDGED] Turning off POLLOUT for the BT in poll() [BTBRIDGED] Completed request with seq 0x7e, netfn 0x0b, lun 0x00, cmd 0x44, cc 0x00

Notice sequence 7D shows a cc of 'ce'.

24.79336|IPMI|dd: I>write ok c8:f0 seq 7d len 3b 25.99297|IPMI|dd: I>read b2h ok cc:f0 seq 7d len 0 cc ce 25.99298|IPMI|sel: E>failed partial_add_esel, o_cc ce, eSELindex 1a0 25.99299|IPMI|I>addsel: ~[0x0000] 00000200 00000020 0004ff35 6fa20007 ....... ...5o..._ 25.99303|IPMI|rp: queuing sync 28:44 25.99316|IPMI|dd: I>write ok 28:44 seq 7e len 10 26.09299|IPMI|dd: I>read b2h ok 2c:44 seq 7e len 2 cc 0 26.09300|IPMI|sel: <<send_esel o_err=00000000, o_cc=xce, sel recID=x70, esel recID=x07 26.09300|IPMI|sel: <<process_esel 26.09301|IPMI|sel: I>ipmisel shutdown event 26.09305|IPMI|rp: I>MSG_STATE_SHUTDOWN: ipmi begin shutdown

Trace from the BMC showing that the response had a completion code of 00 not CE

IPMI Incoming: Seq 0x7d, NetFn 0x32, CMD: 0xf0 0x000000: 01 00 07 00 7c 01 00 62 69 6e 2f 61 6e 6f 6f 00 ....|..bin/anoo. 0x000010: 00 00 00 55 44 03 d4 01 15 31 00 01 28 00 42 49 ...UD....1..(.BI 0x000020: 50 4d 49 00 00 00 00 00 00 00 00 00 00 00 00 00 PMI............. 0x000030: 00 03 cc 00 00 00 00 00 00 03 cc ...........
IPMI PARTIAL ESEL for /tmp/esel0007 Offset = 380 Length = 52 IPMI Response: 0x000000: 00 07 00

causten commented 8 years ago

Added trace... This IPL it was sequence 0x5A that seems to be the problem. The time(NULL) function is only accurate to 1 second. I added an addition trace showing that 1449038747 was doing a lot of work (since sequence 0x56) before hitting 1449038748. If there is suppose to be a 1 second timeout on waiting for ipmid code to complete I don't think the timeout is working correctly.

[BTBRIDGED] 1449038747: Sending dbus signal with seq 0x56, netfn 0x32, lun 0x00, cmd 0xf0 ... [BTBRIDGED] 1449038747: Completed request with seq 0x59, netfn 0x33, lun 0x00, cmd 0xf0, cc 0x00 [BTBRIDGED] 1449038747: Sending dbus signal with seq 0x5a, netfn 0x32, lun 0x00, cmd 0xf0 [BTBRIDGED] 1449038748: Timeout on msg with seq: 0x5a [BTBRIDGED] 1449038748: Disabling timer, no elements remain in queue [BTBRIDGED] 1449038748: Successfully wrote 5 of 5 bytes to /dev/bt-host [BTBRIDGED] 1449038748: Turning off POLLOUT for the BT in poll() [BTBRIDGED] 1449038748: Completed request with seq 0x5a, netfn 0x33, lun 0x00, cmd 0xf0, cc 0xce [BTBRIDGED] 1449038748: No message found to send timeout [BTBRIDGED] 1449038748: Failed to find matching request for dbus method with seq: 0x5a [BTBRIDGED] 1449038748: Processed 1 dbus events [BTBRIDGED] 1449038748: Sending dbus signal with seq 0x5b, netfn 0x0a, lun 0x00, cmd 0x44 [BTBRIDGED] 1449038748: Received a dbus response for msg with seq 0x5b [BTBRIDGED] 1449038748: Processed 1 dbus events [BTBRIDGED] 1449038748: Disabling timer, no elements remain in queue [BTBRIDGED] 1449038748: Successfully wrote 7 of 7 bytes to /dev/bt-host [BTBRIDGED] 1449038748: Turning off POLLOUT for the BT in poll() [BTBRIDGED] 1449038748: Completed request with seq 0x5b, netfn 0x0b, lun 0x00, cmd 0x44, cc 0x00

causten commented 8 years ago

Here is the trace from IPMID code. It does show failures while trying to write back on the dbus. Not sure if the errors are because of the events on btbridge

IPMI Incoming: Seq 0x5a, NetFn 0x32, CMD: 0xf0 0x000000: 01 00 0b 00 ec 03 00 00 00 00 00 00 00 00 4c 00 ..............L. 0x000010: 00 00 11 27 fe df 3c 00 00 00 44 00 24 43 4f 52 ...'..<...D.$COR 0x000020: bf 9a e7 00 00 01 77 6f 6b 00 00 00 00 00 00 00 ......wok....... 0x000030: 00 00 10 00 00 00 00 00 00 00 2d ..........-
IPMI PARTIAL ESEL for /tmp/esel000b Offset = 1004 Length = 52 IPMI Response: 0x000000: 00 0b 00 ...
Failed to call the method: Input/output errorFailed to send the response message IPMI Incoming: Seq 0x5b, NetFn 0x0a, CMD: 0x44 0x000000: 00 00 02 00 00 00 00 20 00 04 ff 35 6f a2 00 0b ....... ...5o... IPMI Handling ADD-SEL for record 0x000b Sensor not found Failed to call the method: No message of desired typeIPMI Response: 0x000000: 00 00 0b ...

causten commented 8 years ago

Another time. It is always failed in the exact sample place.

[BTBRIDGED] 1449039885: Disabling timer, no elements remain in queue [BTBRIDGED] 1449039885: Successfully wrote 7 of 7 bytes to /dev/bt-host [BTBRIDGED] 1449039885: Turning off POLLOUT for the BT in poll() [BTBRIDGED] 1449039885: Completed request with seq 0x5a, netfn 0x33, lun 0x00, cmd 0xf0, cc 0x00 [BTBRIDGED] 1449039885: Sending dbus signal with seq 0x5b, netfn 0x32, lun 0x00, cmd 0xf0 [BTBRIDGED] 1449039886: Timeout on msg with seq: 0x5b [BTBRIDGED] 1449039886: Disabling timer, no elements remain in queue [BTBRIDGED] 1449039886: Successfully wrote 5 of 5 bytes to /dev/bt-host [BTBRIDGED] 1449039886: Turning off POLLOUT for the BT in poll() [BTBRIDGED] 1449039886: Completed request with seq 0x5b, netfn 0x33, lun 0x00, cmd 0xf0, cc 0xce [BTBRIDGED] 1449039886: No message found to send timeout [BTBRIDGED] 1449039886: Failed to find matching request for dbus method with seq: 0x5b [BTBRIDGED] 1449039886: Processed 1 dbus events [BTBRIDGED] 1449039886: Sending dbus signal with seq 0x5c, netfn 0x0a, lun 0x00, cmd 0x44 [BTBRIDGED] 1449039886: Received a dbus response for msg with seq 0x5c [BTBRIDGED] 1449039886: Processed 1 dbus events [BTBRIDGED] 1449039886: Disabling timer, no elements remain in queue [BTBRIDGED] 1449039886: Successfully wrote 7 of 7 bytes to /dev/bt-host [BTBRIDGED] 1449039886: Turning off POLLOUT for the BT in poll() [BTBRIDGED] 1449039886: Completed request with seq 0x5c, netfn 0x0b, lun 0x00, cmd 0x44, cc 0x00

cyrilbur-ibm commented 8 years ago

Have you tried increasing the timeout?

cyrilbur-ibm commented 8 years ago

No response on this for quite some time. Closing