Closed Olivier-ProGlove closed 5 years ago
@pdunaj @Qbicz @jhn-nordic have you seen MIC failure when using HID as well?
Hi @Olivier-ProGlove, we recently had some interoperability testing and there was indeed a problem with Surface Pro. Updating the Bluetooth drivers on Surface Pro helped a lot, and instead of failing most of the time it failed very rarely. I can later post an update what version of Surface it was.
@Olivier-ProGlove I had similar and other problems with Surface Pro (2017 version). Drivers causing issues:
Marvell AVASTAR Bluetooth Radio Adapter Driver Version: v15.68.9120.47
Updating Bluetooth drivers solved them. Working drivers:
Driver Date: 18/05/2018 Driver Version: 15.68.9125.57
Also updating Windows 10 from v10.0.16299 to v10.0.17134 helped with Bluetooth stability on this platform.
Thanks a lot @Qbicz and @wiminordic for your valuable comments.
I found a way to force update on Windows 10 Surface Pro 4 tablet. Until now, I was relying on Microsoft update mechanism. I noticed I could update Surface Pro drivers manually using this page: https://www.microsoft.com/en-us/download/details.aspx?id=49498
The tablet now uses OS Build 17763.195 (19/12/2018) and the Marvell AVASTAR driver 15.68.9125.57.
I am still testing Zephyr commit 41dfc4d478992fc83c998
. And it still does not work with the same MIC issue Disconnected from bc:83:85:0c:f3:ba (public) (reason 61)
.
Which version of Zephyr have you tested?
The fw-nrfconnect-zephyr v1.13.99-ncs2 release: https://github.com/NordicPlayground/fw-nrfconnect-zephyr/releases/tag/v1.13.99-ncs2
I made a quick retest with latest fw-nrfconnect-zephyr. Resetting the HID device while connected to Surface Pro does indeed lead to Surface being unable to connect to the zephyr HID device even with updated Windows 10 and drivers.
@wiminordic @Qbicz so is this a Surface Pro issue?
@carlescufi It is hard to say without detailed debugging. I have witnessed a lot of issues with Windows 10 platforms. Many of them were solved by updating drivers but not all of them (like this one). Windows 8 Surface 2 was getting BSOD on Bluetooth reconnection to Zephyr based HID device. So in general there is some instability there, but it could be caused by incorrect operation on Zephyr side.
I made another retest with the zephyr from the fw-nrfconnect-zephyr v1.13.99-ncs2 release. It seems to work better than latest fw-nrfconnect-zephyr. Issue was reproduced few times but at least several runs were successful (I was able to reconnect to Zephyr HID after it was reset).
Edit: Zephyr version seems to have no impact after all, latest is also sometimes working.
Could it be the default 2 Mbit phy that causes issues? Just a wild guess, but that is a "new" feature
Få Outlook for Androidhttps://aka.ms/ghei36
From: wiminordic notifications@github.com Sent: Wednesday, January 9, 2019 4:15:47 PM To: zephyrproject-rtos/zephyr Cc: Nistad, Jon Helge; Mention Subject: Re: [zephyrproject-rtos/zephyr] BLE HID sample fails to reconnect on Windows 10 tablets (#12362)
I made another retest with the zephyr from the fw-nrfconnect-zephyr v1.13.99-ncs2 release. It seems to work better than latest fw-nrfconnect-zephyr. Issue was reproduced few times but at least several runs were successful (I was able to reconnect to Zephyr HID after it was reset).
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/zephyrproject-rtos/zephyr/issues/12362#issuecomment-452732034, or mute the threadhttps://github.com/notifications/unsubscribe-auth/Ai0HXAxW8hkp2o-eLgYvdu6gxYqGAX40ks5vBgejgaJpZM4Z0BUr.
We have a BLE HID third-party commercial product that works well with the same Windows 10 Surface 4 Pro (using any version of the driver). The device can reconnect to Windows 10 Tablet without any issue.
So it could also be a (legacy?) feature not supported by Zephyr or an advanced feature supported by Zephyr but not well supported by the Windows 10 driver.
@carlescufi @wiminordic @jhn-nordic I have just tested examples/ble_peripheral/ble_app_hids_keyboard/pca10056/s140/armgcc
from Nordic SDK 15.2 on Nordic nRF52840 dev kit (PCA10056) and the reconnection works fine.
So the issue is likely to be Zephyr specific.
@Olivier-ProGlove maybe it would be helpful if you could reproduce with BT_DEBUG_SETTINGS
and BT_DEBUG_SMP
so that we can see what key is being loaded. Based on that I think @jhedberg and @sjanc might be able to help debug the issue. If that doesn't help then @joerchan may be able to take a look once he's done with what he's doing now.
Thanks @carlescufi I have just done the logs as you asked. I am using the latest Zephyr:
commit b72fb4e9e80750ec520c1628b59713ca5c0cb245
Author: Sebastian Bøe <sebastian.boe@nordicsemi.no>
Date: Wed Dec 19 16:56:57 2018 +0100
cmake: Remove dead code 'FindRegex.cmake'
I added the two configuration settings BT_DEBUG_SETTINGS
and BT_DEBUG_SMP
. I removed the passkey to make it faster to duplicate the issue:
$ git diff
diff --git a/samples/bluetooth/peripheral_hids/prj.conf b/samples/bluetooth/peripheral_hids/prj.conf
index bf907c5..38f6126 100644
--- a/samples/bluetooth/peripheral_hids/prj.conf
+++ b/samples/bluetooth/peripheral_hids/prj.conf
@@ -16,3 +16,7 @@ CONFIG_FLASH_MAP=y
CONFIG_FCB=y
CONFIG_SETTINGS=y
CONFIG_SETTINGS_FCB=y
+
+CONFIG_BT_DEBUG=y
+CONFIG_BT_DEBUG_SETTINGS=y
+CONFIG_BT_DEBUG_SMP=y
diff --git a/samples/bluetooth/peripheral_hids/src/main.c b/samples/bluetooth/peripheral_hids/src/main.c
index 194a071..84cc0b0 100644
--- a/samples/bluetooth/peripheral_hids/src/main.c
+++ b/samples/bluetooth/peripheral_hids/src/main.c
@@ -118,7 +118,7 @@ static void auth_cancel(struct bt_conn *conn)
}
static struct bt_conn_auth_cb auth_cb_display = {
- .passkey_display = auth_passkey_display,
+ .passkey_display = NULL, //auth_passkey_display,
.passkey_entry = NULL,
.cancel = auth_cancel,
};
Here are the logs for Android Samsung J5 (no problem on this Android Phone) and Microsoft Windows Surface Pro 4 (reconnection problem on this platform)
@Olivier-ProGlove can you also enable BT_DEBUG_KEYS
and re-run on both Android and the Windows 10 tablet? I have a feeling there must be a failed allocation or match somewhere.
@carlescufi Here are the log with BT_DEBUG_KEYS
:
... Still nothing obvious.
Last week I tried to dump more BT_DEBUG_*
logs but I could not highlight any obvious differences that could help me (... with my limited bluetooth expertise).
I am working on dumping the HCI commands/events on UART1 to see if I can see something (and avoid packet losing due to logging subsystem)...
@Olivier-ProGlove what would be good is to print the LTK that is being stored in flash and then loaded and used with LTK_REQUEST_REPLY
, maybe using printk
to avoid log allocation failures. I hope that maybe @jhedberg can give you a pointer on where to add the printk
statements, I am not that familiar with the settings code.
@jhedberg could you take a quick look at the 2 last logs here? MIC failure means that we loaded an invalid LTK. I wonder if Android is using privacy and the Windows 10 tablet is not.
@carlescufi I don't see anything obvious in the logs. It'd be super helpful to get the HCI logs to see the exact content of the exchanged SMP PDUs. Is is possible Windows10 lacks the bonding flag, i.e. does no-bonding, but Soft Device ignores it and holds on to the LTK?
@jhedberg but Windows 10 is the central, so why would it initiate encryption if it didn't have keys?
@carlescufi my speculation/theory is that it does have the keys. It's just that it didn't set the bonding flag, which causes Zephyr to discard the keys.
@Olivier-ProGlove could you re-run this against the Windows 10 tablet with the following patch applied:
--- a/subsys/bluetooth/host/smp.c
+++ b/subsys/bluetooth/host/smp.c
@@ -2436,6 +2436,9 @@ static u8_t smp_pairing_req(struct bt_smp *smp, struct net_buf *buf)
if ((rsp->auth_req & BT_SMP_AUTH_BONDING) &&
(req->auth_req & BT_SMP_AUTH_BONDING)) {
atomic_set_bit(smp->flags, SMP_FLAG_BOND);
+ BT_INFO("Bonding");
+ } else {
+ BT_INFO("No-bonding");
}
atomic_set_bit(smp->flags, SMP_FLAG_PAIRING);
That should then help prove/disprove my bonding flag theory.
No changes between Android and Window 10 Surface. I also printed the value of req->auth_req
and they use the same value: Bonding auth_req:0x2d
Full logs: Android.log Windows10Tablet.log
I continue to work on my HCI dump. I will post it later today... Feel free to request me additional logs in the meantime.
I also printed the value of req->auth_req and they use the same value: Bonding auth_req:0x2d
Ok, so we can disregard my no-bonding theory then. Next step would be the HCI logs. Anyone know the link to the documentation page the explains how to enable this? On a high-level you need to disable the stdout console and CONFIG_BT_DEBUG_LOG, enable CONFIG_BT_DEBUG_MONITOR, and then on the Linux PC side you need to use "btmon --tty /dev/ttyACM0" (or whatever is your Zephyr console) to capture the logs.
What I am doing for HCi dump is to redirect HCI sent/receive/acl commands (printed in hexadecimal) on the second UART of Nordic Dev Kit. It works fine. I am doing a basic formatting to help you and I am done to send you the logs.
What I have at the moment is something like that:
RCV:LE Ext Advertising Report 3e0d050000f830b13e5c315a7d32f4
SENT:LE LTK REQ NEG Reply 1b20020000
RCV:Disconnection Complete 05040000003d
SENT:LE Set Adv Enabled 0a200101
RCV:LE Scan Req Received 3e13010000000101a5818b42f04f18000000480001
SENT:LE Set Adv Enabled 0a200101
SENT:opcode:0x2016 1620020000
RCV:LE PHY Update Complete 3e0c040000005d00000000000000
RCV:LE Ext Advertising Report 3e0d050000f830b13e5c315a7d32f4
SENT:LE LTK REQ NEG Reply 1b20020000
RCV:Disconnection Complete 05040000003d
SENT:LE Set Adv Enabled 0a200101
RCV:LE Scan Req Received 3e13010000000101a5818b42f04f18000000480001
SENT:LE Set Adv Enabled 0a200101
SENT:opcode:0x2016 1620020000
RCV:LE PHY Update Complete 3e0c040000005d00000000000000
RCV:LE Ext Advertising Report 3e0d050000f830b13e5c315a7d32f4
SENT:LE LTK REQ NEG Reply 1b20020000
RCV:Disconnection Complete 05040000003d
SENT:LE Set Adv Enabled 0a200101
Is it good enough?
@jhedberg Here are the logs and HCI logs for Android and Windows 10 Surface Pro Tablet:
Android.log Android-HCI.log Windows10Tablet.log Windows10Tablet-HCI.log
Note: The hexadecimal after the opcode is the data (without the size).
@jhedberg I had a look to BT_DEBUG_MONITOR and got it working. Usual logs:
The latest logs do confirm the MIC failure:
< HCI Command: LE Long Term Key Request Reply (0x08|0x001a) plen 18
Handle: 0
Long term key: bb4a893db7561aebd12748868b164a32
> HCI Event: Command Complete (0x0e) plen 6
LE Long Term Key Request Reply (0x08|0x001a) ncmd 1
Status: Success (0x00)
Handle: 0
> HCI Event: Disconnect Complete (0x05) plen 4
Status: Success (0x00)
Handle: 0
Reason: Connection Terminated due to MIC Failure (0x3d)
However, I don't see anything wrong with this on our side - the LTK value is the same that was successfully used for the first connection:
< HCI Command: LE Long Term Key Request Reply (0x08|0x001a) plen 18
Handle: 0
Long term key: bb4a893db7561aebd12748868b164a32
> HCI Event: Command Complete (0x0e) plen 6
LE Long Term Key Request Reply (0x08|0x001a) ncmd 1
Status: Success (0x00)
Handle: 0
> HCI Event: Encryption Change (0x08) plen 4
Status: Success (0x00)
Handle: 0
Encryption: Enabled with AES-CCM (0x01)
We have another device that does dual boot Android and Windows 10 (CHUWI Hi10 Pro) that has a similar issue (ie: Works fine for Android but MIC Failure on reconnection with Windows 10).
Here are the logs for this platform: Chuwi-Android.log Chuwi-Android-btmon.log Chuwi-Windows.log Chuwi-Windows-btmon.log
Something I already noticed for the previous case (Samsung Android phone and Windows 10 Tablet) and I noticed again with the Chuwi tablet:
On Android it uses a Random Peer Address type:
> HCI Event: LE Meta Event (0x3e) plen 19
LE Connection Complete (0x01)
Status: Success (0x00)
Handle: 0
Role: Slave (0x01)
Peer address type: Random (0x01)
Peer address: 7A:21:46:B7:D7:30 (Resolvable)
Identity type: Public (0x00)
Identity: 0B:AE:4B:5D:83:88 (OUI 0B-AE-4B)
Connection interval: 45.00 msec (0x0024)
Connection latency: 0 (0x0000)
Supervision timeout: 20000 msec (0x07d0)
Master clock accuracy: 0x00
While on Windows 10, it always uses a Public Peer Address type:
> HCI Event: LE Meta Event (0x3e) plen 19
LE Connection Complete (0x01)
Status: Success (0x00)
Handle: 0
Role: Slave (0x01)
Peer address type: Public (0x00)
Peer address: 88:83:5D:D4:AE:00 (OUI 88-83-5D)
Connection interval: 15.00 msec (0x000c)
Connection latency: 0 (0x0000)
Supervision timeout: 2000 msec (0x00c8)
Master clock accuracy: 0x00
@jhedberg more logs. As I mentioned in my initial issue message when using the USB ASUS BT-400 controller on the Windows Surface Pro 4 (the Windows10 Tablet that fails the reconnection test) I can successfully connect.
Here are the btmon log for the USB ASUS BT-400 controller on the Windows Surface Pro 4: Windows10Tablet-ASUS-btmon.log
The observation I made in my previous message is not valid anymore. The ASUS BT-400 also uses a public address:
> HCI Event: LE Meta Event (0x3e) plen 19 #43 42.970300
LE Connection Complete (0x01)
Status: Success (0x00)
Handle: 0
Role: Slave (0x01)
Peer address type: Public (0x00)
Peer address: 5C:F3:70:8A:A6:97 (CC&C Technologies, Inc)
Connection interval: 60.00 msec (0x0030)
Connection latency: 0 (0x0000)
Supervision timeout: 9600 msec (0x03c0)
Master clock accuracy: 0x05
When comparing the btmon log between the default bluetooth controller and the ASUS one on the same tablet, I can see these two commands being sent during pairing with the default controller while there are not sent for the ASUS controller:
Something else > HCI Event: Encryption Change (0x08)
is not received on the reconnection on the broken Windows 10 Bluetooth controllers (Windows Surface Pro and Chuwi controllers).
Could it be Zephyr do not update the encryption algorithms on the reconnection?
@carlescufi @jhedberg I am still looking at this issue and pushing towards this issue is due to the 'missing' > HCI Event: Encryption Change
on reconnection with Windows 10 (while this HCI event is present on Android devices).
I printed the values of:
struct bt_conn {
...
#if defined(CONFIG_BT_SMP) || defined(CONFIG_BT_BREDR)
bt_security_t sec_level;
bt_security_t required_sec_level;
u8_t encrypt;
#endif /* CONFIG_BT_SMP || CONFIG_BT_BREDR */
...
};
after the first pairing and the failing reconnection.
sec_level:2 required_sec_level:2 encrypt:1
sec_level:1 required_sec_level:2 encrypt:0
Could the fact Zephyr do not not 'restore' the previous security level and encryption status be the reason of the BT_HCI_ERR_TERM_DUE_TO_MIC_FAIL
?
I tried to add a dirty hack to workaround this 'missing' > HCI Event: Encryption Change
but it still did not work. But it might due to the way I did it...
I added this code to le_ltk_request()
to simulate the call to hci_encrypt_change()
:
--- a/subsys/bluetooth/host/hci_core.c
+++ b/subsys/bluetooth/host/hci_core.c
@@ -2959,9 +2959,26 @@ static void le_ltk_request(struct net_buf *buf)
le_ltk_neg_reply(evt->handle);
-done:
+done: {
+
+#if ON_RECONNECTION
+ struct bt_hci_evt_encrypt_change evt = {
+ .status = 0,
+ .handle = handle,
+ .encrypt = 1,
+ };
+ struct net_buf hack_buf = { 0 };
+ hack_buf.data = &evt;
+
+ hci_encrypt_change(&hack_buf);
+#endif /* CONFIG_BT_SMP || CONFIG_BT_BREDR */
+
bt_conn_unref(conn);
}
+}
#endif /* CONFIG_BT_SMP */
I booted a first time without this hack for pairing the device and then enable this hack and flash the firmware to see if it fixed my issue. I could confirm it switched to sec_level:2 required_sec_level:2 encrypt:1
but I still had BT_HCI_ERR_TERM_DUE_TO_MIC_FAIL
. It could be a secret need to be negociated while calling > HCI Event: Encryption Change
.
Any comment on the 'missing' > HCI Event: Encryption Change
or the hack I did? Any suggestion for me to try?
@Olivier-ProGlove can you provide a trace of it working as expected and one failing. Your air traces collected while Windows traces are also recording would help us analyze and provide recommendations to you.
Thanks a lot @FrankGorgenyi for jumping in this thread :smiley: :smiley: :smiley:
I actually think I found the issue this afternoon (German time). I will try to get more information tomorrow to know exactly if it a Zephyr issue or a Surface Pro Bluetooth controller issue. But it is likely that the issue (if my theory is confirmed) could be mitigated by the Surface Pro Bluetooth controller.
Here is the Start Encryption
flow from Bluetooth Specification v5.
From what I observed, Zephyr Link Layer (LL B
on the diagram) sends LL_ENC_RSP
too late for Surface Pro Bluetooth controller (LL A
on the diagram) as I received the MIC Failure termination
disconnection event with from Host A
before Zephyr had time to send LL_ENC_RSP
.
I noticed Zephyr has a Fast Encryption Setup - BT_CTLR_FAST_ENC
option (added by @cvinayak) I guess to solve this issue. Turning on this option allows to reconnect to Windows host after few reconnection tentative. The few tests I did with this option, I needed more than 10 reconnection tentative to successfully reconnect (30+ seconds after the lost of connection). So the 'fast encryption setup' is still too slow.
My next step is too check the timing to know if it is Zephyr who cannot cope with the controller BLE intervals or if it is the Surface Pro controller that does not respect the timings (ie: timeout too early).
A mitigation would be to negociate the interval before Starting the encryption
. I do not know if it is feasible per Bluetooth specification or to increase the Connection interval
on reconnection.
Or maybe a better approach in Windows 10 would be to take the latest negociated Connection interval
on reconnection.
Few connection parameters on reconnection (before being renegociated):
> HCI Event: LE Meta Event (0x3e) plen 19
LE Connection Complete (0x01)
Status: Success (0x00)
Handle: 0
Role: Slave (0x01)
Peer address type: Random (0x01)
Peer address: 67:F9:07:56:A0:D5 (Resolvable)
Identity type: Public (0x00)
Identity: 48:27:EA:7D:26:7B (OUI 48-27-EA)
Connection interval: 48.75 msec (0x0027)
Connection latency: 0 (0x0000)
Supervision timeout: 5000 msec (0x01f4)
Master clock accuracy: 0x01
> HCI Event: LE Meta Event (0x3e) plen 19
LE Connection Complete (0x01)
Status: Success (0x00)
Handle: 0
Role: Slave (0x01)
Peer address type: Public (0x00)
Peer address: BC:83:85:0C:F3:BA (OUI BC-83-85)
Connection interval: 15.00 msec (0x000c)
Connection latency: 0 (0x0000)
Supervision timeout: 2000 msec (0x00c8)
Master clock accuracy: 0x01
> HCI Event: LE Meta Event (0x3e) plen 19
LE Connection Complete (0x01)
Status: Success (0x00)
Handle: 0
Role: Slave (0x01)
Peer address type: Random (0x01)
Peer address: 7A:21:46:B7:D7:30 (Resolvable)
Identity type: Public (0x00)
Identity: 0B:AE:4B:5D:83:88 (OUI 0B-AE-4B)
Connection interval: 45.00 msec (0x0024)
Connection latency: 0 (0x0000)
Supervision timeout: 20000 msec (0x07d0)
Master clock accuracy: 0x00
> HCI Event: LE Meta Event (0x3e) plen 19 #322 6.027500
LE Connection Complete (0x01)
Status: Success (0x00)
Handle: 0
Role: Slave (0x01)
Peer address type: Public (0x00)
Peer address: 88:83:5D:D4:AE:00 (OUI 88-83-5D)
Connection interval: 15.00 msec (0x000c)
Connection latency: 0 (0x0000)
Supervision timeout: 2000 msec (0x00c8)
Master clock accuracy: 0x00
You could notice Chuwi in Windows mode (dual boot table) uses the same parameters as the MS Surface Pro bluetooth controller. I also noticed when I use the ASUS USB BT-400 controller on the MS Surface Pro I also have the same connection interval. But Nordic's nRF52840 dev kit with Zephyr can successfully connect on the Surface Pro with the ASUS USB BT-400 controller while it cannot with SurfacePro controller and Chuwi controller in Windows mode.
I am quite keen to try the Windows Bluetooth trace tool (first time I hear about it). And from what I understand I can capture radio info too :+1:
More data/timing on latest Zephyr (33dae59a5
- Sat Jan 26 - pre West mess...). I added two LOG_INF()
statements - that deferred printing - so less invasive:
--- a/subsys/bluetooth/controller/ll_sw/ctrl.c
+++ b/subsys/bluetooth/controller/ll_sw/ctrl.c
@@ -1866,6 +1866,7 @@ static inline u32_t isr_rx_conn_pkt_ack(struct pdu_data *pdu_data_tx,
break;
case PDU_DATA_LLCTRL_TYPE_ENC_RSP:
+ LOG_INF("ENC_RSP");
/* pause data packet tx */
_radio.conn_curr->pause_tx = 1U;
break;
@@ -2575,6 +2576,7 @@ isr_rx_conn_pkt_ctrl(struct radio_pdu_node_rx *node_rx, u8_t *rx_enqueue)
#if defined(CONFIG_BT_CTLR_LE_ENC)
case PDU_DATA_LLCTRL_TYPE_ENC_REQ:
+ LOG_INF("ENC_REQ");
if (!_radio.conn_curr->role ||
!pdu_len_cmp(PDU_DATA_LLCTRL_TYPE_ENC_REQ,
pdu_data_rx->len)) {
Here are the results of my testing on few devices:
[00:02:01.028,228] <inf> bt_ctlr_llsw_ctrl: ENC_REQ
[00:02:01.088,592] <inf> bt_ctlr_llsw_ctrl: ENC_RSP // 60ms between both packet - DELTA with ENC_REQ
[00:00:12.760,620] <inf> main: Connected
[00:00:12.762,084] <wrn> bt_ctlr_hci: handle: 0x0000, CSA: 0.
[00:00:12.793,060] <inf> bt_ctlr_llsw_ctrl: ENC_REQ
[00:00:12.799,530] <inf> main: Disconnected from (reason 61) // 6ms - DELTA with ENC_REQ
[00:00:16.065,826] <inf> main: Connected
[00:00:16.067,474] <wrn> bt_ctlr_hci: handle: 0x0000, CSA: 0.
[00:00:16.092,895] <inf> bt_ctlr_llsw_ctrl: ENC_REQ
[00:00:16.094,055] <inf> main: Disconnected from (reason 61) // 2ms
[00:00:18.413,818] <inf> main: Connected
[00:00:18.415,222] <wrn> bt_ctlr_hci: handle: 0x0000, CSA: 0.
[00:00:18.433,013] <inf> bt_ctlr_llsw_ctrl: ENC_REQ
[00:00:18.448,547] <inf> main: Disconnected from (reason 61) // 15ms
[00:00:25.130,920] <inf> main: Connected
[00:00:25.132,446] <wrn> bt_ctlr_hci: handle: 0x0000, CSA: 0.
[00:00:25.169,342] <inf> bt_ctlr_llsw_ctrl: ENC_REQ
[00:00:25.170,532] <inf> main: Disconnected from (reason 61) // 1ms
[00:00:28.023,437] <inf> main: Connected
[00:00:28.025,299] <wrn> bt_ctlr_hci: handle: 0x0000, CSA: 0.
[00:00:28.049,591] <inf> bt_ctlr_llsw_ctrl: ENC_REQ
[00:00:28.050,781] <inf> main: Disconnected from (reason 61) // 1ms
[00:01:48.401,977] <inf> bt_ctlr_llsw_ctrl: ENC_REQ
[00:01:48.521,789] <inf> bt_ctlr_llsw_ctrl: ENC_RSP // 120ms
[00:00:12.422,637] <inf> main: Connected
[00:00:12.424,377] <wrn> bt_ctlr_hci: handle: 0x0000, CSA: 0.
[00:00:12.445,129] <inf> bt_ctlr_llsw_ctrl: ENC_REQ
[00:00:12.489,929] <inf> bt_ctlr_llsw_ctrl: ENC_RSP // 45ms
[00:00:12.505,187] <inf> main: Security changed: tW level 2
[00:00:01.936,584] <inf> main: Connected
[00:00:01.938,629] <wrn> bt_ctlr_hci: handle: 0x0000, CSA: 0.
[00:00:01.960,601] <inf> bt_ctlr_llsw_ctrl: ENC_REQ
[00:00:02.005,401] <inf> bt_ctlr_llsw_ctrl: ENC_RSP // 45ms
[00:00:02.020,660] <inf> main: Security changed: tW level 2
[00:00:05.929,077] <inf> main: Connected
[00:00:05.930,633] <wrn> bt_ctlr_hci: handle: 0x0000, CSA: 0.
[00:00:05.956,481] <inf> bt_ctlr_llsw_ctrl: ENC_REQ
[00:00:06.001,312] <inf> bt_ctlr_llsw_ctrl: ENC_RSP // 45ms
[00:00:06.016,540] <inf> main: Security changed: tW level 2
[00:00:59.164,794] <inf> bt_ctlr_llsw_ctrl: ENC_REQ
[00:00:59.179,626] <inf> bt_ctlr_llsw_ctrl: ENC_RSP // 15ms
[00:00:01.506,317] <inf> main: Connected
[00:00:01.508,178] <wrn> bt_ctlr_hci: handle: 0x0000, CSA: 0.
[00:00:01.747,619] <inf> bt_ctlr_llsw_ctrl: ENC_REQ
[00:00:01.844,940] <inf> bt_ctlr_llsw_ctrl: ENC_RSP // 100ms
[00:00:01.893,920] <inf> main: Security changed: tW level 2
[00:00:05.254,547] <inf> main: Connected
[00:00:05.256,134] <wrn> bt_ctlr_hci: handle: 0x0000, CSA: 0.
[00:00:05.477,111] <inf> bt_ctlr_llsw_ctrl: ENC_REQ
[00:00:05.574,432] <inf> bt_ctlr_llsw_ctrl: ENC_RSP // 100ms
[00:00:05.623,413] <inf> main: Security changed: tW level 2
[00:00:03.912,506] <inf> main: Connected
[00:00:03.913,909] <wrn> bt_ctlr_hci: handle: 0x0000, CSA: 0.
[00:00:04.121,429] <inf> bt_ctlr_llsw_ctrl: ENC_REQ
[00:00:04.218,750] <inf> bt_ctlr_llsw_ctrl: ENC_RSP // 100ms
[00:00:04.267,761] <inf> main: Security changed: tW level 2
[00:01:12.933,227] <inf> bt_ctlr_llsw_ctrl: ENC_REQ
[00:01:12.941,131] <inf> bt_ctlr_llsw_ctrl: ENC_RSP // 8ms
[00:00:06.842,956] <inf> main: Connected
[00:00:06.844,390] <wrn> bt_ctlr_hci: handle: 0x0000, CSA: 0.
[00:00:06.912,994] <inf> bt_ctlr_llsw_ctrl: ENC_REQ
[00:00:06.957,916] <inf> bt_ctlr_llsw_ctrl: ENC_RSP // 45ms
[00:00:07.002,593] <inf> main: Security changed: tW level 2
[00:00:02.991,241] <inf> main: Connected
[00:00:02.993,103] <wrn> bt_ctlr_hci: handle: 0x0000, CSA: 0.
[00:00:03.061,096] <inf> bt_ctlr_llsw_ctrl: ENC_REQ
[00:00:03.106,018] <inf> bt_ctlr_llsw_ctrl: ENC_RSP // 45ms
[00:00:03.195,678] <inf> main: Security changed: tW level 2
[00:00:09.367,492] <inf> main: Connected
[00:00:09.368,988] <wrn> bt_ctlr_hci: handle: 0x0000, CSA: 0.
[00:00:09.438,293] <inf> bt_ctlr_llsw_ctrl: ENC_REQ
[00:00:09.483,184] <inf> bt_ctlr_llsw_ctrl: ENC_RSP // 45ms
[00:00:09.527,862] <inf> main: Security changed: tW level 2
[00:00:04.249,267] <inf> main: Connected
[00:00:04.250,854] <wrn> bt_ctlr_hci: handle: 0x0000, CSA: 0.
[00:00:04.323,822] <inf> bt_ctlr_llsw_ctrl: ENC_REQ
[00:00:04.368,713] <inf> bt_ctlr_llsw_ctrl: ENC_RSP // 45ms
[00:00:04.413,391] <inf> main: Security changed: tW level 2
[00:00:26.104,980] <inf> bt_ctlr_llsw_ctrl: ENC_REQ
[00:00:26.165,344] <inf> bt_ctlr_llsw_ctrl: ENC_RSP // 60ms
[00:00:26.225,036] <inf> main: Security changed: tW level 2
[00:00:07.797,149] <inf> main: Connected
[00:00:07.799,011] <wrn> bt_ctlr_hci: handle: 0x0000, CSA: 0.
[00:00:07.821,533] <inf> bt_ctlr_llsw_ctrl: ENC_REQ
[00:00:07.836,425] <inf> bt_ctlr_llsw_ctrl: ENC_RSP // 15ms
[00:00:07.851,104] <inf> main: Security changed: tW level 2
[00:00:03.200,805] <inf> main: Connected
[00:00:03.202,575] <wrn> bt_ctlr_hci: handle: 0x0000, CSA: 0.
[00:00:03.227,142] <inf> bt_ctlr_llsw_ctrl: ENC_REQ
[00:00:03.242,034] <inf> bt_ctlr_llsw_ctrl: ENC_RSP // 15ms
[00:00:03.256,713] <inf> main: Security changed: tW level 2
[00:00:04.058,319] <inf> main: Connected
[00:00:04.060,028] <wrn> bt_ctlr_hci: handle: 0x0000, CSA: 0.
[00:00:04.099,029] <inf> bt_ctlr_llsw_ctrl: ENC_REQ
[00:00:04.118,133] <inf> main: Disconnected from (reason 61) // 19ms
[00:00:07.465,789] <inf> main: Connected
[00:00:07.467,041] <wrn> bt_ctlr_hci: handle: 0x0000, CSA: 0.
[00:00:07.482,330] <inf> bt_ctlr_llsw_ctrl: ENC_REQ
[00:00:07.497,863] <inf> main: Disconnected from (reason 61) // 15ms
[00:00:16.855,926] <inf> main: Connected
[00:00:16.857,666] <wrn> bt_ctlr_hci: handle: 0x0000, CSA: 0.
[00:00:16.872,497] <inf> bt_ctlr_llsw_ctrl: ENC_REQ
[00:00:16.903,045] <inf> main: Disconnected from (reason 61) // 31ms
[00:00:20.048,919] <inf> main: Connected
[00:00:20.050,415] <wrn> bt_ctlr_hci: handle: 0x0000, CSA: 0.
[00:00:20.060,089] <inf> bt_ctlr_llsw_ctrl: ENC_REQ
[00:00:20.105,621] <inf> main: Disconnected from (reason 61) // 45ms
Here are the Windows Trace you asked me:
BthTracing.etl.zip with the associated log trace from UART BthTracing.log. You could notice there had been a successful reconnection. After rebooting the device a second time it could not reconnect.
I did a second test after cleaning the flash of the Nordic Dev Board (as usual) and restarting Windows Surface Pro. Here are the BTL trace BthTracing2.etl.zip and the UART trace BthTracing2.log. Same as before - no successful reconnection.
I could not get the Radio Trace working. See attached error: radio.log
@cvinayak @joerchan
Command disallowed to an LTK Request Response. Any clues?
> HCI Event: LE Meta Event (0x3e) plen 19 #322 6.027500
LE Connection Complete (0x01)
Status: Success (0x00)
Handle: 0
Role: Slave (0x01)
Peer address type: Public (0x00)
Peer address: 88:83:5D:D4:AE:00 (OUI 88-83-5D)
Connection interval: 15.00 msec (0x000c)
Connection latency: 0 (0x0000)
Supervision timeout: 2000 msec (0x00c8)
Master clock accuracy: 0x00
< HCI Command: LE Set Advertise Enable (0x08|0x000a) plen 1 #323 6.030600
Advertising: Enabled (0x01)
> HCI Event: Command Complete (0x0e) plen 4 #324 6.032000
LE Set Advertise Enable (0x08|0x000a) ncmd 1
Status: Memory Capacity Exceeded (0x07)
< HCI Command: LE Read Remote Used Features (0x08|0x0016) plen 2 #325 6.039200
Handle: 0
> HCI Event: Command Status (0x0f) plen 4 #326 6.040700
LE Read Remote Used Features (0x08|0x0016) ncmd 1
Status: Success (0x00)
< ACL Data TX: Handle 0 flags 0x00 dlen 6 #327 6.042300
SMP: Security Request (0x0b) len 1
Authentication requirement: Bonding, No MITM, SC, No Keypresses (0x09)
> HCI Event: LE Meta Event (0x3e) plen 13 #328 6.044400
LE Long Term Key Request (0x05)
Handle: 0
Random number: 0x0000000000000000
Encrypted diversifier: 0x0000
< HCI Command: LE Long Term Key Request... (0x08|0x001a) plen 18 #329 6.046900
Handle: 0
Long term key: 79b3099dc804728dfc3a77ef6317f11d
> HCI Event: Command Complete (0x0e) plen 6 #330 6.049900
LE Long Term Key Request Reply (0x08|0x001a) ncmd 1
Status: Command Disallowed (0x0c)
Handle: 0
> HCI Event: Disconnect Complete (0x05) plen 4 #331 6.071400
Status: Success (0x00)
Handle: 0
Reason: Connection Terminated due to MIC Failure (0x3d)
@cvinayak @joerchan could you take a look at the Wireshark traces? https://github.com/zephyrproject-rtos/zephyr/files/2733871/Windows10TabletIssue-Wireshark.zip
@joerchan suggests the issue might be that:
LL_SLAVE_FEATURE_REQ
on packet 24LL_ENC_REQ
on packet 29LL_SLAVE_FEATURE_REQ
with LL_UNKNOWN_RSP
on packet 31LL_UNKNOWN_RSP
The issue seems to be that the master should respond to the feature request first and then send the encryption request, so it looks like a bug in the Windows tablet controller.
"If at any time during the encryption start procedure, the Link Layer of the master or the slave receives an unexpected Data Channel PDU from the peer Link Layer, it shall immediately exit the Connection State, and shall transition to the Standby State. The Host shall be notified that the link has been disconnected with the error code Connection Terminated Due to MIC Failure (0x3D)"
You can see in the HCI log that there is a disconnect event with this error code. So that confirms the behaviour of the slave. I do however think that the slave should not respond with command disallowed before the disconnect event., i.e until the disconnect event is delivered, the command should be able to complete with success.
I took me a while to get the right trace. To be short, while it seems to work much better there might still be an issue.
Environment of test:
zephyr/samples/bluetooth/peripheral_hids/
on nrf52840_pca10056
using Windows 10 Surface Pro 4.Here are the log files: 2ndReconnectionFail.zip
Here are part of the logs for the respective reconnection:
btmon
:< HCI Command: LE Set Advertise Enable (0x08|0x000a) plen 1 #416 9.306600
Advertising: Enabled (0x01)
> HCI Event: Command Complete (0x0e) plen 4 #417 9.308000
LE Set Advertise Enable (0x08|0x000a) ncmd 1
Status: Connection Limit Exceeded (0x09)
< HCI Command: LE Read Remote Used Features (0x08|0x0016) plen 2 #418 9.314900
Handle: 0
> HCI Event: Command Status (0x0f) plen 4 #419 9.316400
LE Read Remote Used Features (0x08|0x0016) ncmd 1
Status: Success (0x00)
< ACL Data TX: Handle 0 flags 0x00 dlen 6 #420 9.318100
SMP: Security Request (0x0b) len 1
Authentication requirement: Bonding, MITM, SC, No Keypresses (0x0d)
> HCI Event: LE Meta Event (0x3e) plen 13 #421 9.337900
LE Long Term Key Request (0x05)
Handle: 0
Random number: 0x0000000000000000
Encrypted diversifier: 0x0000
> HCI Event: Number of Completed Packets (0x13) plen 5 #422 9.340400
Num handles: 1
Handle: 0
Count: 1
< HCI Command: LE Long Term Key Request... (0x08|0x001a) plen 18 #423 9.342200
Handle: 0
Long term key: 63c0ac41ed7816a2b7f1135a378d3849
> HCI Event: Command Complete (0x0e) plen 6 #424 9.345200
LE Long Term Key Request Reply (0x08|0x001a) ncmd 1
Status: Success (0x00)
Handle: 0
> HCI Event: LE Meta Event (0x3e) plen 12 #425 9.347100
LE Read Remote Used Features (0x04)
Status: Unsupported Remote Feature / Unsupported LMP Feature (0x1a)
Handle: 0
Features: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
> HCI Event: Encryption Change (0x08) plen 4 #426 9.370500
Status: Success (0x00)
Handle: 0
Encryption: Enabled with AES-CCM (0x01)
btmon
:> HCI Event: LE Meta Event (0x3e) plen 19 #477 4.208800
LE Connection Complete (0x01)
Status: Success (0x00)
Handle: 0
Role: Slave (0x01)
Peer address type: Public (0x00)
Peer address: BC:83:85:0C:F3:BA (OUI BC-83-85)
Connection interval: 15.00 msec (0x000c)
Connection latency: 0 (0x0000)
Supervision timeout: 2000 msec (0x00c8)
Master clock accuracy: 0x01
< HCI Command: LE Set Advertise Enable (0x08|0x000a) plen 1 #478 4.211900
Advertising: Enabled (0x01)
> HCI Event: Command Complete (0x0e) plen 4 #479 4.213300
LE Set Advertise Enable (0x08|0x000a) ncmd 1
Status: Connection Limit Exceeded (0x09)
< HCI Command: LE Read Remote Used Features (0x08|0x0016) plen 2 #480 4.220800
Handle: 0
> HCI Event: Command Status (0x0f) plen 4 #481 4.222300
LE Read Remote Used Features (0x08|0x0016) ncmd 1
Status: Success (0x00)
< ACL Data TX: Handle 0 flags 0x00 dlen 6 #482 4.224000
SMP: Security Request (0x0b) len 1
Authentication requirement: Bonding, MITM, SC, No Keypresses (0x0d)
> HCI Event: LE Meta Event (0x3e) plen 13 #483 4.240200
LE Long Term Key Request (0x05)
Handle: 0
Random number: 0x0000000000000000
Encrypted diversifier: 0x0000
< HCI Command: LE Long Term Key Request... (0x08|0x001a) plen 18 #484 4.242700
Handle: 0
Long term key: 63c0ac41ed7816a2b7f1135a378d3849
> HCI Event: Command Complete (0x0e) plen 6 #485 4.245700
LE Long Term Key Request Reply (0x08|0x001a) ncmd 1
Status: Success (0x00)
Handle: 0
> HCI Event: Disconnect Complete (0x05) plen 4 #486 4.256400
Status: Success (0x00)
Handle: 0
Reason: Connection Terminated due to MIC Failure (0x3d)
I will redo btmon+sniffer traces with CONFIG_BT_CTLR_SLAVE_FEAT_REQ=n
. I have just tested quickly and them issue.
Here are the traces (logs+btmon+sniffer) of the same settings + CONFIG_BT_CTLR_SLAVE_FEAT_REQ=n
: WithoutFeatReq.zip
> HCI Event: LE Meta Event (0x3e) plen 19 #68 14.858700
LE Connection Complete (0x01)
Status: Success (0x00)
Handle: 0
Role: Slave (0x01)
Peer address type: Public (0x00)
Peer address: BC:83:85:0C:F3:BA (OUI BC-83-85)
Connection interval: 15.00 msec (0x000c)
Connection latency: 0 (0x0000)
Supervision timeout: 2000 msec (0x00c8)
Master clock accuracy: 0x01
< HCI Command: LE Set Advertise Enable (0x08|0x000a) plen 1 #69 14.861800
Advertising: Enabled (0x01)
> HCI Event: Command Complete (0x0e) plen 4 #70 14.863200
LE Set Advertise Enable (0x08|0x000a) ncmd 1
Status: Connection Limit Exceeded (0x09)
< HCI Command: LE Set PHY (0x08|0x0032) plen 7 #71 14.870100
Handle: 0
All PHYs preference: 0x00
TX PHYs preference: 0x02
LE 2M
RX PHYs preference: 0x02
LE 2M
PHY options preference: Reserved (0x0000)
> HCI Event: Command Status (0x0f) plen 4 #72 14.872200
LE Set PHY (0x08|0x0032) ncmd 1
Status: Success (0x00)
< ACL Data TX: Handle 0 flags 0x00 dlen 6 #73 14.873800
SMP: Security Request (0x0b) len 1
Authentication requirement: Bonding, MITM, SC, No Keypresses (0x0d)
> HCI Event: LE Meta Event (0x3e) plen 13 #74 14.888300
LE Long Term Key Request (0x05)
Handle: 0
Random number: 0x0000000000000000
Encrypted diversifier: 0x0000
< HCI Command: LE Long Term Key Request... (0x08|0x001a) plen 18 #75 14.890800
Handle: 0
Long term key: 63c0ac41ed7816a2b7f1135a378d3849
> HCI Event: Command Complete (0x0e) plen 6 #76 14.893800
LE Long Term Key Request Reply (0x08|0x001a) ncmd 1
Status: Success (0x00)
Handle: 0
> HCI Event: Disconnect Complete (0x05) plen 4 #77 14.904500
Status: Success (0x00)
Handle: 0
Reason: Connection Terminated due to MIC Failure (0x3d)
Follow up on this still existing issue in this new ticket with more information: https://github.com/zephyrproject-rtos/zephyr/issues/14044
Describe the bug
BLE HID sample fails to reconnect on (some?) Windows 10 tablets. I only have two Windows 10 tablets (including the Windows Surface Pro 4) to check. I cannot duplicate the issue on a Windows 10 laptop (Lenovo laptop in my case), Linux and Android.
The two bluetooth controllers are (Windows 10 says I am using the latest drivers):
The Surface Pro has USB slots. Connecting the ASUS BT-400 USB controller on the Surface Pro makes the BLE reconnection successful.
I only tested with Nordic nRF52 MCUs (particularly nRF52840) as a BLE peripheral.
To Reproduce
Expected behavior Windows 10 should automatically reconnect to the dev kit as it advertises on restart
I tested on Zephyr (from 4 days ago):
I added logging traces in the bluetooth controller to get an idea of the issue:
And here is the result:
I used the Nordic sniffer to try to get some insights. Here are the few Wireshark traces I manage to get:
Windows10TabletIssue-Wireshark.zip