zephyrproject-rtos / zephyr

Primary Git Repository for the Zephyr Project. Zephyr is a new generation, scalable, optimized, secure RTOS for multiple hardware architectures.
https://docs.zephyrproject.org
Apache License 2.0
10.25k stars 6.28k forks source link

Watchdog handling overhaul #23282

Open carlescufi opened 4 years ago

carlescufi commented 4 years ago

This issue is intended to collect all information related to the need for a new approach to how Zephyr handles the watchdog peripheral in a consistent way across all platforms.

This work was triggered by issue #22858.

stephanosio commented 4 years ago

So far we have had two different schools of thoughts on this:

  1. Watchdog should always be enabled by default to make it (arguably) fool-proof, even if application does not make use of it (i.e. does not feed the watchdog) and that will lead to seemingly mysterious and baffling processor resets from the perspective of the users that are unaware of the watchdog being active, clearly because he/she never enabled it in the first place (this has already confused quite a few very experienced devs on the SAM E70 platform).

  2. Watchdog is a "feature" after all, and should only be enabled at user's discretion, with the belief that any sensible engineer will seek ways to enable the watchdog before releasing a production firmware anyway.

pabigot commented 4 years ago

There's also:

  1. The watchdog behavior on a particular platform should follow power-up behavior specified by the vendor:
    • if it's enabled on power-up with certain settings, Zephyr leaves it alone
    • if it's not enabled on power-up, Zephyr leaves it alone

There should of course be standard options to force the watchdog to be turned off (whether or not the Zephyr watchdog API is enabled; use this explicitly in samples), and to enable the Zephyr watchdog driver without automatically doing anything with it. There should also be a consistent way of inserting feed operations during startup where necessary to keep the watchdog from firing during setup or handoff between boot stages.

This perspective is based on an assumption that people developing products for a specific platform are aware of the platform's capabilities, perhaps selected it for its specific features, and are prepared to use it correctly. All sample applications (and not IMO dev boards) should explicitly disable the watchdog so even new developers can see that this is a feature they need to be aware of but don't have to deal with during initial exploration.

It gives up on the notion that all Zephyr applications should behave the same on all platforms without modification, at least for the watchdog feature where that appears to be infeasible while providing access to the full range of ways to use the hardware (viz. direct control via HAL).

carlescufi commented 4 years ago

API meeting 10th March 2020

@mnkp:

@henrikbrixandersen

@henrikbrixandersen @pabigot @mnkp

Proposal:

menuconfig WATCHDOG
    bool "Watchdog Support"
    select HAS_DTS_WDT
    help
      Include support for watchdogs.

config WDT_BOOT_INIT
    bool "Initialize the Watchdog very early on in the boot sequence"
    help
           If this is enabled, zephyr will initialize the Watchdog to the timeout value taken from DT, if this is disabled, Zephyr will do nothing. So, on Atmel platforms, if this is 'n', the watchdog will trigger since it's enabled by default.

if WATCHDOG
// Other watchdog Kconfig options
endif

zephyr/kernel/boot.c

/* Either disable the watchdog (if applicable) or configure it to the right interval */
#ifdef WDT_BOOT_INIT
int ret = soc_watchdog_configure();
#endif
soc/arm/family/soc.c:

int soc_watchdog_configure(void)
{
    /* Use DT_INST_WATCHDOG_BOOT_TIMEOUT to configure the watchdog */
}
#endif

@henrikbrixandersen

Oanerer commented 4 years ago

An additional platform specific feature I would like to point out: https://github.com/zephyrproject-rtos/zephyr/issues/20693.

For now the issue was 'fixed' for the SiLabs Gecko series by returning -ENOTSUP when both a reset and a callback function were requested. However, if the watchdog handling is overhauled, we might want to take a second look at this as well?

carlescufi commented 4 years ago

An additional platform specific feature I would like to point out: #20693.

For now the issue was 'fixed' for the SiLabs Gecko series by returning -ENOTSUP when both a reset and a callback function were requested. However, if the watchdog handling is overhauled, we might want to take a second look at this as well?

Thanks for bringing this up. I will add it to the requirements.

nzmichaelh commented 4 years ago

+1 to leaving it to the product developer to decide, and +1 to supporting devices that can turn on the WDT at boot via fuses (aka don't disable the WDT at startup)

nandojve commented 4 years ago

CC @hwilmers

nslowell commented 4 years ago

This might be out of scope, but I'm wondering if a subsys software watchdog might be developed that creates an API for components to register that they wish to be required to feed/kick the dog and register a callback for notification of a software watchdog event. A software watchdog is common b/c only one component should be directly interacting with the hardware watchdog so this enables multiple components to participate without conflicting over the hardware watchdog.

This could help with the decision making of whether to start the hardware watchdog or not--hence why I took a chance at posting it here.

martinjaeger commented 3 years ago

@nslowell Just finished to propose a draft implementation of a software watchdog. Please have a look at PR https://github.com/zephyrproject-rtos/zephyr/pull/28947. Happy to receive some feedback about the approach.