Closed nkskjames closed 8 years ago
After further investigation, it seems petitboot is sending the same sequence number twice in a row and since btbridged uses the sequence number as an identifier in the queue, can it possibly lookup wrong entry? This trace has a custom printf: printf("bt_host_write: 0x%08X,%d,%d, 0x%08X\n",bt_msg,bt_msg->req.seq,bt_msg->rsp.seq,bt_msg->call);
BTBRIDGED] Successfully wrote 12 of 12 bytes to /dev/bt-host bt_host_write: 0x000256F0,15,15, 0x00027BB8 [BTBRIDGED] Successfully wrote 12 of 12 bytes to /dev/bt-host bt_host_write: 0x000257D8,15,15, 0x00027BB8 [BTBRIDGED] Successfully wrote 12 of 12 bytes to /dev/bt-host bt_host_write: 0x000257D8,16,16, 0x00027BB8 [BTBRIDGED] Successfully wrote 12 of 12 bytes to /dev/bt-host bt_host_write: 0x00026870,16,16, 0x00027BB8 [BTBRIDGED] Successfully wrote 12 of 12 bytes to /dev/bt-host bt_host_write: 0x00026870,17,17, 0x000255B8 [BTBRIDGED] Successfully wrote 12 of 12 bytes to /dev/bt-host bt_host_write: 0x00027CF8,17,17, 0x000255B8 [BTBRIDGED] Successfully wrote 12 of 12 bytes to /dev/bt-host
@cyrilbur-ibm , could you check this out?
@nkskjames, but that shouldn't cause a double free, when btbridge receives a request it enqueues onto the end of its list, so it will have a list which might look like: [x]->[y]->[z] then say it receives another message with a sequence number z (which I'll identify as z') so: [x]->[y]->[z]->[z'] Anyone who does a lookup for z should get reference for z and from there they'll deal with it and dequeue it, next time around a reference for z would get z' and the same thing would happen.
I HAVE found a bug in bt_q_drop()!
Could you test with: https://github.com/cyrilbur-ibm/btbridge/tree/issues_10 ? Thanks!
I have rebooted about 200 times and no crashes. I then put old code on to make sure it still crashes. It does. So problem solved. Please do pull request. Thanks!!
btbridged crashes after 5-100 host power cycles. After crash, if I manually restart, everything is fine again.
Code path:
} ...
btbridged --verbose
[BTBRIDGED] 1446346408: Timeout on msg with seq: 0x0d
[BTBRIDGED] 1446346408: Timeout on msg with seq: 0x0d [BTBRIDGED] 1446346408: Timeout on msg with seq: 0x0d [BTBRIDGED] 1446346408: Timeout on msg with seq: 0x0d [BTBRIDGED] 1446346408: Timeout on msg with seq: 0x0d [BTBRIDGED] 1446346408: Timeout on msg with seq: 0x0d [BTBRIDGED] 1446346408: Timeout on msg with seq: 0x0d [BTBRIDGED] 1446346408: Timeout on msg with seq: 0x0d [BTBRIDGED] 1446346408: Timeout on msg with seq: 0x0d [BTBRIDGED] 1446346408: Sending dbus signal with seq 0x0e, netfn 0x00, lun 0x00, cmd 0x09 [BTBRIDGED] 1446346408: Timeout on msg with seq: 0x0d [BTBRIDGED] 1446346408: Received a dbus response for msg with seq 0x0e [BTBRIDGED] 1446346408: Processed 1 dbus events [BTBRIDGED] 1446346408: Timeout on msg with seq: 0x0d [BTBRIDGED] 1446346408: Sending dbus signal with seq 0x0e, netfn 0x00, lun 0x00, cmd 0x09 [BTBRIDGED] 1446346408: Adjusting timer for next element [BTBRIDGED] 1446346408: Successfully wrote 5 of 5 bytes to /dev/bt-host [BTBRIDGED] 1446346408: Completed request with seq 0x0d, netfn 0x07, lun 0x00, cmd 0x08, cc 0xce [BTBRIDGED] 1446346408: Message with seq 0x0e is being timed out despite appearing to have been responded to. Slow BT? [BTBRIDGED] 1446346408: Timeout on msg with seq: 0x0e [BTBRIDGED] 1446346408: Received a dbus response for msg with seq 0x0e [BTBRIDGED] 1446346408: Processed 1 dbus events [BTBRIDGED] 1446346408: Sending dbus signal with seq 0x0e, netfn 0x00, lun 0x00, cmd 0x09 [BTBRIDGED] 1446346408: Adjusting timer for next element [BTBRIDGED] 1446346408: Successfully wrote 12 of 12 bytes to /dev/bt-host [BTBRIDGED] 1446346408: Turning off POLLOUT for the BT in poll() [BTBRIDGED] 1446346408: Completed request with seq 0x0e, netfn 0x01, lun 0x00, cmd 0x09, cc 0x00 [BTBRIDGED] 1446346408: Received a dbus response for msg with seq 0x0e [BTBRIDGED] 1446346408: Processed 1 dbus events [BTBRIDGED] 1446346408: Adjusting timer for next element [BTBRIDGED] 1446346408: Successfully wrote 12 of 12 bytes to /dev/bt-host [BTBRIDGED] 1446346408: Turning off POLLOUT for the BT in poll() [BTBRIDGED] 1446346408: Completed request with seq 0x0e, netfn 0x01, lun 0x00, cmd 0x09, cc 0x00 [BTBRIDGED] 1446346408: Sending dbus signal with seq 0x0f, netfn 0x00, lun 0x00, cmd 0x09 [BTBRIDGED] 1446346408: Received a dbus response for msg with seq 0x0f [BTBRIDGED] 1446346408: Processed 1 dbus events [BTBRIDGED] 1446346408: Sending dbus signal with seq 0x0f, netfn 0x00, lun 0x00, cmd 0x09 [BTBRIDGED] 1446346408: Successfully wrote 12 of 12 bytes to /dev/bt-host [BTBRIDGED] 1446346408: Completed request with seq 0x0f, netfn 0x01, lun 0x00, cmd 0x09, cc 0x00 [BTBRIDGED] 1446346408: Adjusting timer for next element [BTBRIDGED] 1446346408: Successfully wrote 12 of 12 bytes to /dev/bt-host [BTBRIDGED] 1446346408: Couldn't create response message [BTBRIDGED] 1446346408: Couldn't send response message * Error in `./btbridged_msg': double free or corruption (fasttop): 0x00027878 *