Closed ChristianAmannKT closed 3 years ago
Slave dissappeared for a number events, while master was trying to send an empty packet. When slave re-appeared the central sends connection update with instant in the past. Looks like an issue with setting the instant when the packet is queued, not when it is first actually sent.
@cvinayak this looks like an issue in the Controller as per @joerchan's analysis.
From the sniffer log, the slave has not sync-ed at event counter 33 for the UPDATE_IND sent at event counter 27. Please use a zephyr peripheral sample and let me know if its reproducible.
Yes, the instant is set when the UPDATE_IND PDU is enqueued from ULL to LLL. LLL operating as packet controller, will not parse the PDU and update it. This will have to be vendor specific.
Even if LLL would set the instant on first transmission, if the transmissions are not ack-ed (like consecutive CRC errors in every connecction event), the instant will pass. But, as procedure timeouts are not used for connection update, instant passed could be the correct outcome (and hence, the rationale behind setting the instant when connection update procedure is enqueued).
I was able to reproduce it using the default peripheral sample (see attachment). However, I had to explicitly set CONFIG_BT_CTLR_CONN_PARAM_REQ=n
in the prj.conf. Otherwise the peripheral would just send a connection parameter update request after the central was done with it's connection update indications. This behavior keeps the connection alive.
Do you have a sniffer log that you can attach here?
Unfortunatly not, my sniffer setup is unable to log the communication. The last thing I was able to see was the connection request. I'm not sure what I am doing wrong here.
Slave dissappeared for a number events, while master was trying to send an empty packet. When slave re-appeared the central sends connection update with instant in the past. Looks like an issue with setting the instant when the packet is queued, not when it is first actually sent.
Slave disappearing for a number of events seems to be caused by a bad connection event timing by the master. There is a connection parameter update indication in connection event 27. The new connection parameters are scheduled for connection event 33 and this is exactly when the slave disappears. Connection interval at that time is 20 ms, transmit window size is 1.25 ms and transmit window offset is 0 ms. The distance between connection event 32 and 33 is 21.418 ms (please see attached screenshot), which is bigger than the connection event + transmit window offset + transmit window size. The slave probably emerges again when the receiving window gets big enough (due to window widening). My question is: what is causing the bad timing by the master ?
@cvinayak could you please comment on this?
It appears like the Zephyr controller has sent the conn_update_ind PDU without filling the window offset value of 1.
Could you please recheck with the following in your prj.conf: CONFIG_BT_CTLR_ADVANCED_FEATURES=y CONFIG_BT_CTLR_SCHED_ADVANCED=n
Tested the attached sample on a nRF52840DK modified prj file to use UART console (my WIP branch rebased over latest upstream master 27529e0c0d) and do not see any issues (makes me believe this is not a bug in controller source code). Console output: nRF sniffer trace (wireshark) file, gh_29636_success_nrf52840dk_nrfconnect.pcapng: gh_29636_success_nrf52840dk_nrfconnect.zip
Tested the attached sample on nRF52840DK unmodified prj file: RTT output: nRF sniffer trace (wireshark) file, gh_29636_success_nrf52840dk_nrfconnect_rtt.pcapng gh_29636_success_nrf52840dk_nrfconnect_rtt.zip
Tested the attached sample on nRF52840Dongle unmodified prj file: (no console, as I dont have SWD pinheads soldered, I flashed the hex using nrfconnect desktop's programmer) nRF sniffer trace (wireshark) file, gh_29636_success_nrf52840dongle_nrfconnect_rtt.pcapng gh_29636_success_nrf52840dongle_nrfconnect_rtt.zip
I am not able to reproduce the issue on the latest upstream master with my work in progress on top. Please provide details of your zephyr commit hash that reproduces the issue. Also, the dongle revision details (small white sticker on the dongle, mine is PCA10059 1.2.0, 2020.7
Hi,
thanks for the update!
I am on commit https://github.com/zephyrproject-rtos/zephyr/commit/cc4708e3170bfedcc2be33d8dc64c48534a76b0f and the my dongle is PCA10059 1.0.0 2018.34
. I flash it using a J-Link programmer.
Because I update my master frequently I just checked again that the issue still occurs and it does:
That being said, when i edit my prj.conf to contain your suggested parameters (see below), I will never run into a connection timeout.
CONFIG_BT_CTLR_ADVANCED_FEATURES=y
CONFIG_BT_CTLR_SCHED_ADVANCED=n
For me this is an acceptable workaround. If you need anything else from me to find the root cause, just let me know.
Thanks a lot!
Could related to https://github.com/zephyrproject-rtos/zephyr/issues/25440
@Guayusa please verify the fix: https://github.com/zephyrproject-rtos/zephyr/pull/31477
I checked out your branch and let it run without the workaround mentioned earlier. I did not face any disconnects anymore. LGTM!
I checked out your branch and let it run without the workaround mentioned earlier. I did not face any disconnects anymore. LGTM!
@Guayusa Thank you for the testing and fast response.
Note This sample is just to provoke the issue. This is obviously not a real scenario but it's the easiest way to show it.
Describe the bug When a central device updates connection parameters a few times in a row, the central sends one of the update indications too late. That way, once it is finally sent out, the instant for which the the new parameters shall be valid lies then in the past. The connection now gets disconnect either with reason 0x08 (timeout) or 0x28 (instant passed).
Devices involved for this sample:
To Reproduce 1) Build and flash the provided sample to the usb dongle (Scripts for building and viewing RTT output are included) 2) Hook up the nRF52 DK to a computer and start nRF Connect (v3.6.0) 3) Load the Bluetooth Low Energy App, select the device and let it start advertising. 4) Keep the DK close to the nRF dongle, it will now connect/pair and update connection parameters 5) Both sides will notify about the connection being terminated because of a timeout
Expected behavior The connection update indication should either be sent in time, or if not possible, the connection update procedure should fail entirely.
Impact Need to come up with weird workarounds (timeouts etc.) in order to proceed
Logs and console output RTT Log from central:
Ellisys trace: Ellisys_trace.zip
nRF52 USB dongle sample code: nRF52840_Dongle sample.zip