zephyrproject-rtos / zephyr

Primary Git Repository for the Zephyr Project. Zephyr is a new generation, scalable, optimized, secure RTOS for multiple hardware architectures.
https://docs.zephyrproject.org
Apache License 2.0
10.49k stars 6.42k forks source link

drivers: nrf_802154: nrf_802154_trx.c - assertion fault when enabling Segger SystemView tracing #50084

Open fgrandel opened 2 years ago

fgrandel commented 2 years ago

Describe the bug Assertion fault (radio not disabled) in IRQ handler when receiving a radio packet via nRF 802154 driver while tracing with Segger SystemView backend is enabled.

To Reproduce Steps to reproduce the behavior:

  1. Enable Tracing together with the nRF 802.15.4 radio driver
# Tracing
CONFIG_TRACING=y
CONFIG_SEGGER_SYSTEMVIEW=y

#Radio
CONFIG_IEEE802154=y
CONFIG_NET_L2_IEEE802154=y
CONFIG_NET_L2_IEEE802154_RADIO_CSMA_CA=y
  1. Build and flash some app with radio reception to the target hardware (nRF52840)
  2. Wait for incoming packet.
  3. The assertion will trip as documented below. The condition can even be reproduced in a debugging session - see stacktrace below.

Obs: It is NOT necessary for the SystemView application to be connected and running. The exception will occur even when just running the application.

Expected behavior The IRQ should work independently of tracing being enabled or not.

Impact Tracing is not usable while receiving packets on the radio.

Logs and console output

ASSERTION FAIL @ WEST_TOPDIR/modules/hal/nordic/drivers/nrf_802154/driver/src/nrf_802154_trx.c:1379
[00:00:03.325,103] <err> os: r0/a1:  0x00000004  r1/a2:  0x00000563  r2/a3:  0x20005a5c
[00:00:03.325,134] <err> os: r3/a4:  0x00000000 r12/ip:  0x00000000 r14/lr:  0x0001d043
[00:00:03.325,134] <err> os:  xpsr:  0x61000011
[00:00:03.325,164] <err> os: Faulting instruction address (r15/pc): 0x0002959a
[00:00:03.325,195] <err> os: >>> ZEPHYR FATAL ERROR 4: Kernel panic on CPU 0
[00:00:03.325,225] <err> os: Fault during interrupt handling
[00:00:03.325,256] <err> os: Current thread: 0x20002518 (idle)
[00:00:04.157,745] <err> os: Halting system

This is the stack trace:

wait_until_radio_is_disabled@0x0001d024 (zephyr/modules/hal/nordic/drivers/nrf_802154/driver/src/nrf_802154_trx.c:1379)
rxframe_finish@0x0001d602 (zephyr/modules/hal/nordic/drivers/nrf_802154/driver/src/nrf_802154_trx.c:1461)
irq_handler_crcok@0x0001d812 (zephyr/modules/hal/nordic/drivers/nrf_802154/driver/src/nrf_802154_trx.c:2094)
nrf_802154_radio_irq_handler@0x0001e5da (zephyr/modules/hal/nordic/drivers/nrf_802154/driver/src/nrf_802154_trx.c:2468)
nrf5_radio_irq@0x00030ed4 (zephyr/zephyr/drivers/ieee802154/ieee802154_nrf5.c:731)
??@0x000087ae (zephyr/zephyr/arch/arm/core/aarch32/isr_wrapper.S:259)

And this the context of the assertion that fails:

nrf_802154_trx.c, line 1359ff

static inline void wait_until_radio_is_disabled(void)
{
    nrf_802154_log_function_enter(NRF_802154_LOG_VERBOSITY_HIGH);

    bool radio_is_disabled = false;

    // RADIO should enter DISABLED state after no longer than RX ramp-down time, which is equal
    // approximately 0.5us. Taking a bold assumption that a single iteration of the loop takes
    // one cycle to complete, 32 iterations would amount to exactly 0.5 us of execution time.
    // Please note that this approach ignores software latency completely, i.e. RADIO should
    // have changed state already before entering this function due to ISR processing delays.
    for (uint32_t i = 0; i < MAX_RXRAMPDOWN_CYCLES; i++)
    {
        if (nrf_radio_state_get(NRF_RADIO) == NRF_RADIO_STATE_DISABLED)
        {
            radio_is_disabled = true;
            break;
        }
    }

    assert(radio_is_disabled); /* --> THIS IS THE FAILED ASSERTION */
    (void)radio_is_disabled;

    nrf_802154_log_function_exit(NRF_802154_LOG_VERBOSITY_HIGH);
}

Environment (please complete the following information):

Additional context n/a

rlubos commented 2 years ago

Unfortunately, I don't have access to adafruit_feather_nrf52840, but I've tried to reproduce the issue with my nrf52840dk_nrf52840 DK but with no success. I've used echo_server/echo_client samples, with all the configs you mention enabled, but got no luck hitting the assert. The sample just keeps working, regardless of SystemView being connected or not.

@jciupis Did you perhaps encounter any problems with the aforementioned assert in the past?

jciupis commented 2 years ago

@rlubos No, I don't recall this assert ever causing any trouble.

fgrandel commented 2 years ago

@rlubos, @jciupis Would it help you if I provided the source code of my application? It's not exactly open source, but it's also not very sensitive. I could push it to a private repository on github and let you have a look. As I'm using the usual abstractions (pinctrl, devicetree, etc.) it shouldn't be too hard to make it work on a standard Nordic dev kit which is very similar to the Feather. Unfortunately it is quite hard to provide a minimal showcase as this seems to be some integration issue. Otherwise I'd already have provided a fix as I usually do. I spent considerable time to find the root cause but I've no more ideas where to look...

fgrandel commented 2 years ago

Oh and of course I'd also be available for a remote debugging session if you like (over whatever video tool you use). But then it's also not that much of a show stopper. So I understand if you don't have the time for any of this of course. Just close if you don't think it worth the effort.

rlubos commented 2 years ago

@fgrandel If you could share your app, I can try to reproduce the problem.

fgrandel commented 2 years ago

@rlubos Ok, will do. Just give me a little time to provide a working example as I'm currently in the middle of implementing a feature. As soon as the issue is (again) reproducible, I'll let you know.

fgrandel commented 2 years ago

@rlubos You should have gotten an invitation to the private repository. I just checked that the problem is still reproducible with the latest main branch. The problem is in this app: https://github.com/fgrandel/co2sensor/tree/master/edge/firmware/gateway

This simple commit triggers the bug: https://github.com/fgrandel/co2sensor/commit/2eb29a744ec6ee9717c6b59bece6652cd575a93f

rlubos commented 1 year ago

@fgrandel Thank you for the access. To my surprise, I was able to build your application w/o any modifications needed.

It took me a moment to reproduce, as it seems that it's not only the SystemView that needs to be enabled but also an active USB is needed (application USB cable connected to a host). With both conditions fulfilled, I hit the assert on every first reception, just as described.

Since the problem seems to be quite complex (a lot of factors considered) I might need to seek some help internally in the driver dev team. Therefore I can't give any promises regarding ETA for the fix at this point. But at least I can acknowledge that the problem exists and is reproducible on our DK.

fgrandel commented 1 year ago

@rlubos Wow, I'm impressed! If I can help anything let me know.

github-actions[bot] commented 1 year ago

This issue has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this issue will automatically be closed in 14 days. Note, that you can always re-open a closed issue at any time.

fgrandel commented 1 year ago

Thanks @carlescufi for keeping the ticket alive. :-)

github-actions[bot] commented 1 year ago

This issue has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this issue will automatically be closed in 14 days. Note, that you can always re-open a closed issue at any time.

github-actions[bot] commented 1 year ago

This issue has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this issue will automatically be closed in 14 days. Note, that you can always re-open a closed issue at any time.

github-actions[bot] commented 1 year ago

This issue has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this issue will automatically be closed in 14 days. Note, that you can always re-open a closed issue at any time.

github-actions[bot] commented 11 months ago

This issue has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this issue will automatically be closed in 14 days. Note, that you can always re-open a closed issue at any time.

github-actions[bot] commented 9 months ago

This issue has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this issue will automatically be closed in 14 days. Note, that you can always re-open a closed issue at any time.

github-actions[bot] commented 7 months ago

This issue has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this issue will automatically be closed in 14 days. Note, that you can always re-open a closed issue at any time.

github-actions[bot] commented 3 months ago

This issue has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this issue will automatically be closed in 14 days. Note, that you can always re-open a closed issue at any time.

github-actions[bot] commented 1 month ago

This issue has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this issue will automatically be closed in 14 days. Note, that you can always re-open a closed issue at any time.

nordic-piks commented 2 weeks ago

Triaged internally