zephyrproject-rtos / zephyr

Primary Git Repository for the Zephyr Project. Zephyr is a new generation, scalable, optimized, secure RTOS for multiple hardware architectures.
https://docs.zephyrproject.org
Apache License 2.0
10.87k stars 6.63k forks source link

ASSERTION FAIL [!radio_is_ready()] #28701

Closed caco3 closed 4 years ago

caco3 commented 4 years ago

Describe the bug I am building a high througput application where we send notifications every few milliseconds. It is working as expected, how ever sporadically we get

ASSERTION FAIL [!radio_is_ready()] @ WEST_TOPDIR/zephyr/subsys/bluetooth/controller/ll_sw/nordic/lll/lll_conn.c
:277
[00:00:04.270,141] <err> os: r0/a1:  0x00000003  r1/a2:  0x00000000  r2/a3:  0x00000001
[00:00:04.270,141] <err> os: r3/a4:  0x00000074 r12/ip:  0x00000000 r14/lr:  0x00023f51
[00:00:04.270,141] <err> os:  xpsr:  0x61000011
[00:00:04.270,141] <err> os: Faulting instruction address (r15/pc): 0x00023f5c
[00:00:04.270,141] <err> os: >>> ZEPHYR FATAL ERROR 3: Kernel oops on CPU 0
[00:00:04.270,141] <err> os: Fault during interrupt handling

[00:00:04.270,172] <err> os: Current thread: 0x20001c38 (unknown)
[00:00:04.360,412] <err> os: Halting system

I am aware of https://github.com/zephyrproject-rtos/zephyr/issues/21601 and added following lines to proj.conf:

CONFIG_SPEED_OPTIMIZATIONS=y
CONFIG_BT_CTLR_OPTIMIZE_FOR_SPEED=y

but without any visible improvement.

We also see same behaviour with zephyr-2.4.0-rc2 and also with a revision back into July, so we don't think it is a recent regression.

Is there any check we can do to avoid running into this assertion?

carlescufi commented 4 years ago

Are you disabling interrupts for a long time? This would be you calling irq_lock() and then not unlocking them with an irq_unlock() for a long period of time.

carlescufi commented 4 years ago

Also can you please provide a little more information, such as the board you are building for and the full configuration you are using? (please attach your build/zephyr/.config file).

caco3 commented 4 years ago

@carlescufi Thanks for your support!

Below some code excerpt:

void main(void) {
    NRF_TIMER2->MODE = TIMER_MODE_MODE_Timer;               // Set the timer in Counter Mode
    NRF_TIMER2->TASKS_CLEAR = 1;                            // clear the task first to be usable for later
    NRF_TIMER2->PRESCALER = 4;                              // Set prescaler. 16 MHz / 2^4 = 1 MHz
    NRF_TIMER2->BITMODE = TIMER_BITMODE_BITMODE_16Bit;      // Set counter to 16 bit resolution
    NRF_TIMER2->CC[0] = intervalUs;                         // Set value for TIMER2 compare register 0

    NRF_TIMER2->INTENSET = TIMER_INTENSET_COMPARE0_Enabled << TIMER_INTENSET_COMPARE0_Pos;
    NRF_TIMER2->SHORTS = TIMER_SHORTS_COMPARE0_CLEAR_Enabled << TIMER_SHORTS_COMPARE0_CLEAR_Pos;
    NVIC_EnableIRQ(TIMER2_IRQn); // Enable the interrupt
    NRF_TIMER2->TASKS_START = 1;               // Start TIMER2
    IRQ_CONNECT(TIMER2_IRQn, 7, TIMER2_IRQHandler, 0, 0);
}

K_SEM_DEFINE(dataReadyToBeSend, 0, 1);

ISR_DIRECT_DECLARE(TIMER2_IRQHandler) {
    NRF_TIMER2->EVENTS_COMPARE[0] = 0; // Clear compare register 0 event
    k_sem_give(&dataReadyToBeSend);
    return 0;
}

K_THREAD_DEFINE(DataStreaming, 10240, Thread_DataStreaming, NULL, NULL, NULL, 3, 0, 0);

static void Thread_DataStreaming(void) {
    while (1) {
        int ret = k_sem_take(&dataReadyToBeSend, K_MSEC(100));
        if ret == 0) {
            bt_gatt_notify(NULL, &service.attrs[ATTR_DATA_INDEX], data, sizeof(data_t));
        } else {/* .. */ }
    }
}

BT_GATT_SERVICE_DEFINE(service,
        BT_GATT_PRIMARY_SERVICE(&service),
        BT_GATT_CHARACTERISTIC(
               &dataCharacteristic.uuid,
               BT_GATT_CHRC_NOTIFY,
               BT_GATT_PERM_READ,
               NULL, NULL,
               &data),
        BT_GATT_CCC(data_ccc_cfg_changed, BT_GATT_PERM_READ | BT_GATT_PERM_WRITE),
);
cvinayak commented 4 years ago
IRQ_CONNECT(TIMER2_IRQn, 7, TIMER2_IRQHandler, 0, 0);

You cannot use priority level 7, it translates to priority value of 0 in Zephyr. Do turn on CONFIG_ASSERT=y to catch this at runtime.

Use a value of 5 if using Zero Latency Interrupts else you can use upto 6.

If this solves your issue, please remove the issue label "bug".

caco3 commented 4 years ago

Dear @cvinayak and @carlescufi

Thank you for your quick support! Indeed changing the priority of the IRQ_CONNECT to 5 solves this and all other instabilities I saw.

I will therefore close this issue.

Is there some further documentation about this? I had a look on IRQ_CONNECT as well as Interrupts but did not find anything related to this issue.

If this solves your issue, please remove the issue label "bug".

I have not the needed rights to do this.

carlescufi commented 4 years ago

Is there some further documentation about this? I had a look on IRQ_CONNECT as well as Interrupts but did not find anything related to this issue.

No, but feel free to submit a Pull Request, it would be much appreciated!