zephyrproject-rtos / zephyr

Primary Git Repository for the Zephyr Project. Zephyr is a new generation, scalable, optimized, secure RTOS for multiple hardware architectures.
https://docs.zephyrproject.org
Apache License 2.0
10.88k stars 6.63k forks source link

at86rf2xx radio driver not (reliably) sending ACKs #21659

Closed markus-becker-tridonic-com closed 4 years ago

markus-becker-tridonic-com commented 4 years ago

Describe the bug The newly introduced at86rf2xx driver is not sending ACKs.

@nandojve, many thanks for developing and upstreaming the at86rf2xx radio driver! I have been trying to use the driver on a custom hardware with an STM32 uC and the RF233 using OpenThread.

However, I am not (reliably) getting ACKs, as can be seen in the following capture: image

The debug output indicated that the correct address filter is being set (ieee802154_rf2xx.rf2xx_set_ieee_addr: IEEE address 8e:f3:5a:10:a3:7d:64:e1) and works as the packets sent to the 64bit address are received.

*** Booting Zephyr OS build v2.1.0-rc3-43-g957ab95b0d29  ***
[00:00:00.000,000] <dbg> ieee802154_rf2xx.rf2xx_init: Initialize RF2XX Transceiver
[00:00:00.000,000] <inf> ieee802154_rf2xx: Optional instance of GPIOB device activated
[00:00:00.000,000] <inf> ieee802154_rf2xx: Optional instance of GPIOA device activated
[00:00:00.000,000] <dbg> ieee802154_rf2xx.configure_spi: SPI GPIO CS configured on GPIOB:12
[00:00:00.000,000] <dbg> ieee802154_rf2xx.rf2xx_init: GPIO and SPI configured
[00:00:00.001,000] <dbg> ieee802154_rf2xx.power_on_and_setup: Transceiver is old and unstable release
[00:00:00.001,000] <dbg> ieee802154_rf2xx.rf2xx_thread_main: RF2XX main thread
[00:00:00.003,000] <dbg> ieee802154_rf2xx.rf2xx_filter: Applying filter 0
[00:00:00.003,000] <dbg> ieee802154_rf2xx.rf2xx_set_ieee_addr: IEEE address 06:af:0f:3d:9a:23:65:78
[00:00:00.003,000] <dbg> ieee802154_rf2xx.rf2xx_filter: Applying filter 1
[00:00:00.003,000] <dbg> ieee802154_rf2xx.rf2xx_set_short_addr: Short Address: 0xfffe
[00:00:00.005,000] <inf> net_openthread: -CORE----: Notifier: StateChanged (0x00000001) [Ip6+]
[00:00:00.005,000] <inf> net_l2_openthread: Ipv6 address added
[00:00:00.005,000] <inf> net_openthread: Idle -> Discover
[00:00:00.005,000] <dbg> ieee802154_rf2xx.rf2xx_filter: Applying filter 0
[00:00:00.005,000] <dbg> ieee802154_rf2xx.rf2xx_set_ieee_addr: IEEE address 8e:f3:5a:10:a3:7d:64:e1
[00:00:00.005,000] <inf> net_openthread: -CORE----: Notifier: StateChanged (0x00000001) [Ip6+]
[00:00:00.005,000] <inf> net_l2_openthread: Ipv6 address removed
[00:00:00.005,000] <inf> net_openthread: -CORE----: Notifier: StateChanged (0x00000001) [Ip6+]
[00:00:00.005,000] <inf> net_l2_openthread: Ipv6 address added
[00:00:00.005,000] <inf> net_openthread: -CORE----: Notifier: StateChanged (0x00000001) [Ip6+]
[00:00:00.006,000] <dbg> ieee802154_rf2xx.rf2xx_filter: Applying filter 2
[00:00:00.006,000] <dbg> ieee802154_rf2xx.rf2xx_set_pan_id: Pan Id: 0x75ce
[00:00:00.006,000] <inf> net_openthread: -CORE----: Notifier: StateChanged (0x00000001) [Ip6+]
[00:00:00.007,000] <inf> net_openthread: ▒▒
[00:00:00.007,000] <inf> net_openthread: -MESH-CP-: JoinerState: Idle -> Discover
[00:00:00.014,000] <inf> net_config: Initializing network
[00:00:00.014,000] <inf> net_openthread:
[00:00:00.014,000] <inf> net_l2_openthread: Ipv6 address added
[00:00:00.016,000] <err> log: argument 0 in log message "%s" missing log_strdup().
[00:00:00.040,000] <err> log: argument 0 in log message "%s" missing log_strdup().
[00:00:00.076,000] <err> log: argument 0 in log message "%s" missing log_strdup().
[00:00:00.101,000] <err> log: argument 0 in log message "%s" missing log_strdup().
[00:00:00.125,000] <err> log: argument 0 in log message "%s" missing log_strdup().
[00:00:00.160,000] <err> log: argument 0 in log message "%s" missing log_strdup().
[00:00:00.170,000] <err> log: argument 0 in log message "%s" missing log_strdup().
[00:00:00.179,000] <err> log: argument 0 in log message "%s" missing log_strdup().
[00:00:00.203,000] <err> log: argument 0 in log message "%s" missing log_strdup().
[00:00:03.014,000] <dbg> net_echo_client_sample.main: Run echo client
[00:00:03.747,000] <dbg> ieee802154_rf2xx.rf2xx_trx_rx: Caught a packet (59) (LQI: ff, RSSI: -54, ED: 28)
[00:00:03.747,000] <inf> net_core: net (0x20000040): RF2XX Rx stack stack real size 800 unused 392 usage 408/800 (51 %)
[00:00:03.748,000] <inf> net_openthread: er network: ba7240e1a6047e86, pan:0xd01d, port:49153, chan:23, rssi:-54, allow-any:no
[00:00:03.748,000] <inf> net_openthread: -MESH-CP-: Joiner discover network: ba7240e1a6047e86, pan:0xd01d, port:49153, chan:23, rssi:-54, allow-any:no
[00:00:03.754,000] <dbg> ieee802154_rf2xx.rf2xx_trx_rx: Caught a packet (59) (LQI: ff, RSSI: -53, ED: 29)
[00:00:03.754,000] <inf> net_core: net (0x20000040): RF2XX Rx stack stack real size 800 unused 392 usage 408/800 (51 %)
[00:00:03.755,000] <inf> net_openthread: er network: ba7240e1a6047e86, pan:0xd01d, port:49153, chan:23, rssi:-54, allow-any:no
[00:00:03.755,000] <inf> net_openthread: -MESH-CP-: Joiner discover network: ba7240e1a6047e86, pan:0xd01d, port:49153, chan:23, rssi:-54, allow-any:no
[00:00:03.769,000] <dbg> ieee802154_rf2xx.rf2xx_trx_rx: Caught a packet (59) (LQI: ff, RSSI: -54, ED: 28)
[00:00:03.769,000] <inf> net_core: net (0x20000040): RF2XX Rx stack stack real size 800 unused 392 usage 408/800 (51 %)
[00:00:03.770,000] <inf> net_openthread: er network: ba7240e1a6047e86, pan:0xd01d, port:49153, chan:23, rssi:-54, allow-any:no
[00:00:03.770,000] <inf> net_openthread: -MESH-CP-: Joiner discover network: ba7240e1a6047e86, pan:0xd01d, port:49153, chan:23, rssi:-54, allow-any:no

To Reproduce Build echo_client with ot-overlay.conf and commission against a Thread BR, e.g. based on nrf52840.

Expected behavior ACKs are being sent.

Impact Device cannot be joined onto Thread network, because DTLS gets multiple unexpected link-layer retransmitted packets.

Environment (please complete the following information):

markus-becker-tridonic-com commented 4 years ago

It seems sending FORCE_TRX_OFF stops the transmission of ACKs.

Also mentioned in the datasheet http://ww1.microchip.com/downloads/en/devicedoc/atmel-8351-mcu_wireless-at86rf233_datasheet.pdf:

RX_AACK - Receive with Automatic Acknowledgement A state transition to RX_AACK_ON is initiated by writing the RX_AACK_ON command to the register bits TRX_CMD. On success, reading register bits TRX_STATUS (register 0x01, TRX_STATUS) returns RX_AACK_ON or BUSY_RX_AACK. The latter one is returned when a frame is being received. The RX_AACK Extended Operating Mode is left by writing a new command to the register bits TRX_CMD. If the Atmel AT86RF233 is within a frame receive or acknowledgment procedure (BUSY_RX_AACK), the state change is executed after finishing. Alternatively, the commands FORCE_TRX_OFF or FORCE_PLL_ON can be used to cancel the RX_AACK transaction and switch to TRX_OFF or PLL_ON state respectively

Commenting the following, makes ACK work for me:

diff --git a/drivers/ieee802154/ieee802154_rf2xx.c b/drivers/ieee802154/ieee802154_rf2xx.c
index 876352857d..6fa9e2b7b8 100644
--- a/drivers/ieee802154/ieee802154_rf2xx.c
+++ b/drivers/ieee802154/ieee802154_rf2xx.c
@@ -69,12 +69,12 @@ static void trx_isr_timeout(struct k_timer *timer_id)
 static void rf2xx_trx_set_state(struct device *dev,
                                enum rf2xx_trx_state_cmd_t state)
 {
-       do {
-               rf2xx_iface_reg_write(dev, RF2XX_TRX_STATE_REG,
-                                     RF2XX_TRX_PHY_STATE_CMD_FORCE_TRX_OFF);
-       } while (RF2XX_TRX_PHY_STATUS_TRX_OFF !=
-                (rf2xx_iface_reg_read(dev, RF2XX_TRX_STATUS_REG) &
-                 RF2XX_TRX_PHY_STATUS_MASK));
+       // do {
+       //      rf2xx_iface_reg_write(dev, RF2XX_TRX_STATE_REG,
+       //                            RF2XX_TRX_PHY_STATE_CMD_FORCE_TRX_OFF);
+       // } while (RF2XX_TRX_PHY_STATUS_TRX_OFF !=
+       //       (rf2xx_iface_reg_read(dev, RF2XX_TRX_STATUS_REG) &
+       //        RF2XX_TRX_PHY_STATUS_MASK));

        do {
                rf2xx_iface_reg_write(dev, RF2XX_TRX_STATE_REG, state);
@@ -85,8 +85,8 @@ static void rf2xx_trx_set_state(struct device *dev,

 static void rf2xx_trx_set_rx_state(struct device *dev)
 {
-       rf2xx_trx_set_state(dev, RF2XX_TRX_PHY_STATE_CMD_TRX_OFF);
-       rf2xx_iface_reg_read(dev, RF2XX_IRQ_STATUS_REG);
+       // rf2xx_trx_set_state(dev, RF2XX_TRX_PHY_STATE_CMD_TRX_OFF);
+       // rf2xx_iface_reg_read(dev, RF2XX_IRQ_STATUS_REG);
        /**
         * Set extended RX mode
         * Datasheet: chapter 7.2 Extended Operating Mode
@@ -214,8 +214,8 @@ static void rf2xx_thread_main(void *arg)
                                /* Set PLL_ON to avoid transceiver receive
                                 * new data until finish reading process
                                 */
-                               rf2xx_trx_set_state(dev,
-                                                   RF2XX_TRX_PHY_STATE_CMD_PLL_ON);
+                               // rf2xx_trx_set_state(dev,
+                               //                  RF2XX_TRX_PHY_STATE_CMD_PLL_ON);
                                k_timer_stop(&ctx->trx_isr_timeout);
                                rf2xx_trx_rx(dev);
                                rf2xx_trx_set_state(dev,
nandojve commented 4 years ago

@markus-becker-tridonic-com is nice to see that someone is really using this driver. Please, do not change rf2xx_trx_set_state method. This is how we change transceiver FSM states. The RF2XX_TRX_PHY_STATE_CMD_PLL_ON before rf2xx_trx_rx is currently used to avoid buffer overrun. You can disable it but be aware it can happen once RX is enabled. I hope people like you can help with precious data to improve the driver.

I'm planning add a RX buffer queue to improve in general.

nandojve commented 4 years ago

Hi @markus-becker-tridonic-com, could you check if the below idea works, please? I'm without radios until Jan 13.

The system is currently configured with Dynamic Frame Buffer Protection enabled by RX_SAFE_MODE at (register 0x0C, TRX_CTRL_2). However, we need execute 2 rf2xx_iface_frame_read at rf2xx_trx_rx and Dynamic Frame Buffer Protection is released on the rising edge of pin 23 (/SEL) during the first Frame Buffer read access. I'll improve this situation using SRAM access in near future.

The below snip will ensure that ACK will be sent since transceiver is on RF2XX_TRX_PHY_STATUS_BUSY_RX_AACK state until finish all RX/ACK procedures, _Figure 7-12. Flow Diagram of RXAACK. After that, we ensures that new air frames don't override current buffer until finish RX procedure.

Is this helpfull?

diff --git a/drivers/ieee802154/ieee802154_rf2xx.c b/drivers/ieee802154/ieee802154_rf2xx.c
index 876352857d..a0f6caf453 100644
--- a/drivers/ieee802154/ieee802154_rf2xx.c
+++ b/drivers/ieee802154/ieee802154_rf2xx.c
@@ -211,6 +211,11 @@ static void rf2xx_thread_main(void *arg)
                        k_timer_start(&ctx->trx_isr_timeout, K_MSEC(10), 0);
                } else if (isr_status & (1 << RF2XX_TRX_END)) {
                        if (ctx->trx_state == RF2XX_TRX_PHY_BUSY_RX) {
+                               /* Ensures that automatically ACK will be sent when
+                                * requested */
+                               while (rf2xx_iface_reg_read(dev, RF2XX_TRX_STATUS_REG) ==
+                                          RF2XX_TRX_PHY_STATUS_BUSY_RX_AACK);
+
                                /* Set PLL_ON to avoid transceiver receive
                                 * new data until finish reading process
                                 */
markus-becker-tridonic-com commented 4 years ago

The snippet above works as well. Thanks very much.

markus-becker-tridonic-com commented 4 years ago

@nandojve do you think this could still make it into the 2.2 release?

nandojve commented 4 years ago

@nandojve do you think this could still make it into the 2.2 release?

Hi @markus-becker-tridonic-com, on my point of view yes. The solution is ready at #21753. If you have conditions to share some test results may help close the issue faster.

markus-becker-tridonic-com commented 4 years ago

@nandojve solution is ACKing fine now: image

complete OpenThread stack works in combination with https://github.com/zephyrproject-rtos/zephyr/issues/21763 .