zephyrproject-rtos / zephyr

Primary Git Repository for the Zephyr Project. Zephyr is a new generation, scalable, optimized, secure RTOS for multiple hardware architectures.
https://docs.zephyrproject.org
Apache License 2.0
11.02k stars 6.71k forks source link

task watchdog crash/asset on NRF52840 - need to reorder task_wdt_feed() in task_wdt_add() #39523

Closed ehughes closed 3 years ago

ehughes commented 3 years ago

Describe the bug

On the NRF52840 (and likely other NRF variants), the CPU will hardfault when calling

task_wdt_add()

The underlying issue is documented here:

https://devzone.nordicsemi.com/f/nordic-q-a/75112/assertion-fail-m_cb-p_instance--drv_inst_idx-state-nrfx_drv_state_powered_0

The fix is simple. Move task_wdt_feed(id); after the wdt_setup call like this:

int task_wdt_add(uint32_t reload_period, task_wdt_callback_t callback,
         void *user_data)
{
    if (reload_period == 0) {
        return -EINVAL;
    }

    /* look for unused channel (reload_period set to 0) */
    for (int id = 0; id < ARRAY_SIZE(channels); id++) {
        if (channels[id].reload_period == 0) {
            channels[id].reload_period = reload_period;
            channels[id].user_data = user_data;
            channels[id].timeout_abs_ticks = K_TICKS_FOREVER;
            channels[id].callback = callback;

#ifdef CONFIG_TASK_WDT_HW_FALLBACK
            if (!hw_wdt_started && hw_wdt_dev) {
                /* also start fallback hw wdt */
                wdt_setup(hw_wdt_dev,
                    WDT_OPT_PAUSE_HALTED_BY_DBG);
                hw_wdt_started = true;
            }
#endif

task_wdt_feed(id);

            return id;
        }
    }

    return -ENOMEM;
}

To Reproduce

Run the sample in samples/subsys/task_wdt with the nrf52840dk_nrf52840 board.

Expected behavior

When the call is made to task_wdt_feed, the CPU will hardfault. I believe there is an asset being triggered.

Impact

Annoying. For the time being, I will be manually patching but this should be in the main zephyr tree.

Environment (please complete the following information):

martinjaeger commented 3 years ago

Yep, this looks like a bug and an appropriate fix, thanks a lot for reporting!

I wonder why the hard fault didn't happen during my testing with the nrf52840dk_nrf52840 a while ago. For STM32, which I mainly use, feeding the watchdog before calling wdt_setup() doesn't seem to be an issue.

I'll double-check in the next days and afterwards create a PR.

ehughes commented 3 years ago

So, the hardfault is happening with the Segger Ozone Debugger attached. There may be other details of why the assert is triggering it. Either way, I found this out while scanning the Nordic Devzone while trying to get the Task WDT to work and thought I would at least post the information and the patch.

I will also be testing on the NRF91 soon as well. I'll keep an eye out of the pull request.

martinjaeger commented 3 years ago

Definitely good to post the bug here. I was not aware of the discussion in the Nordic Devzone, otherwise the fix would have made it into 2.7 release.

martinjaeger commented 3 years ago

I was able to reproduce the issue. See linked PR with the fix.