Watchdog handling overhaul

carlescufi commented 4 years ago

This issue is intended to collect all information related to the need for a new approach to how Zephyr handles the watchdog peripheral in a consistent way across all platforms.

This work was triggered by issue #22858.

stephanosio commented 4 years ago

So far we have had two different schools of thoughts on this:

Watchdog should always be enabled by default to make it (arguably) fool-proof, even if application does not make use of it (i.e. does not feed the watchdog) and that will lead to seemingly mysterious and baffling processor resets from the perspective of the users that are unaware of the watchdog being active, clearly because he/she never enabled it in the first place (this has already confused quite a few very experienced devs on the SAM E70 platform).
Watchdog is a "feature" after all, and should only be enabled at user's discretion, with the belief that any sensible engineer will seek ways to enable the watchdog before releasing a production firmware anyway.

pabigot commented 4 years ago

There's also:

The watchdog behavior on a particular platform should follow power-up behavior specified by the vendor:
- if it's enabled on power-up with certain settings, Zephyr leaves it alone
- if it's not enabled on power-up, Zephyr leaves it alone

There should of course be standard options to force the watchdog to be turned off (whether or not the Zephyr watchdog API is enabled; use this explicitly in samples), and to enable the Zephyr watchdog driver without automatically doing anything with it. There should also be a consistent way of inserting feed operations during startup where necessary to keep the watchdog from firing during setup or handoff between boot stages.

This perspective is based on an assumption that people developing products for a specific platform are aware of the platform's capabilities, perhaps selected it for its specific features, and are prepared to use it correctly. All sample applications (and not IMO dev boards) should explicitly disable the watchdog so even new developers can see that this is a feature they need to be aware of but don't have to deal with during initial exploration.

It gives up on the notion that all Zephyr applications should behave the same on all platforms without modification, at least for the watchdog feature where that appears to be infeasible while providing access to the full range of ways to use the hardware (viz. direct control via HAL).

carlescufi commented 4 years ago

API meeting 10th March 2020

@mnkp:

The watchdog as a peripheral is different from other regular ones
Not possible to actually disable the watchdog with a Kconfig variable because of hw differences
Today: unless you enable the CONFIG_WATCHDOG, the Zephyr watchdog subsystem leaves the watchdog untouched but some SoCs (Kinetis, RISCV-RV32M1) may disable it (most Kinetis parts) or reconfigure it (Kinetis KE1XF) right after boot
Types of hardware
- Type 1: Watchdog is disabled unless the software enables it (eg. Nordic, STM32)
- Type 2: Watchdog is enabled by default upon boot (Atmel and Kinetis, optionally STM32)
- Type 2.1: You need to feed the watchdog at a "regular" interval after boot (Atmel, 10s)
- Type 2.2: You need to very quickly reconfigure the watchdog after boot (Kinetis, 250-256 cycles)
Additional constraint in several chips: The watchdog timeout can only be set once per boot

@henrikbrixandersen

Usecase with MCUboot: the code in the Kinetis KE1XF SoC files reconfigures the watchdog to trigger at a later point. The MCUboot build is configured with CONFIG_WATCHDOG=n and the main application with CONFIG_WATCHDOG=y, so the application can reconfigure it later.
In the Kinetis code this is done very early on: https://github.com/zephyrproject-rtos/zephyr/blob/master/arch/arm/core/aarch32/cortex_m/reset.S#L75

@henrikbrixandersen @pabigot @mnkp

Requirement: Users must be able to tell Zephyr not to touch the Watchdog hardware at all
Requirement: A user needs to be able to configure the watchdog so that it sets a particular timeout at boot, without enabling the watchdog API (eg. MCUboot)

Proposal:

menuconfig WATCHDOG
    bool "Watchdog Support"
    select HAS_DTS_WDT
    help
      Include support for watchdogs.

config WDT_BOOT_INIT
    bool "Initialize the Watchdog very early on in the boot sequence"
    help
           If this is enabled, zephyr will initialize the Watchdog to the timeout value taken from DT, if this is disabled, Zephyr will do nothing. So, on Atmel platforms, if this is 'n', the watchdog will trigger since it's enabled by default.

if WATCHDOG
// Other watchdog Kconfig options
endif

zephyr/kernel/boot.c

/* Either disable the watchdog (if applicable) or configure it to the right interval */
#ifdef WDT_BOOT_INIT
int ret = soc_watchdog_configure();
#endif
soc/arm/family/soc.c:

int soc_watchdog_configure(void)
{
    /* Use DT_INST_WATCHDOG_BOOT_TIMEOUT to configure the watchdog */
}
#endif

@henrikbrixandersen

With the proposal above we need to rewrite the Kinetis and RV32M1 boot code that handles the watchdog
External watchdogs: Configure via bus, then feed via GPIO (@mnkp could be pre-configure during the driver instance initialization)

Oanerer commented 4 years ago

An additional platform specific feature I would like to point out: https://github.com/zephyrproject-rtos/zephyr/issues/20693.

For now the issue was 'fixed' for the SiLabs Gecko series by returning -ENOTSUP when both a reset and a callback function were requested. However, if the watchdog handling is overhauled, we might want to take a second look at this as well?

carlescufi commented 4 years ago

An additional platform specific feature I would like to point out: #20693.

For now the issue was 'fixed' for the SiLabs Gecko series by returning -ENOTSUP when both a reset and a callback function were requested. However, if the watchdog handling is overhauled, we might want to take a second look at this as well?

Thanks for bringing this up. I will add it to the requirements.

nzmichaelh commented 4 years ago

+1 to leaving it to the product developer to decide, and +1 to supporting devices that can turn on the WDT at boot via fuses (aka don't disable the WDT at startup)

nandojve commented 4 years ago

CC @hwilmers

nslowell commented 4 years ago

This might be out of scope, but I'm wondering if a subsys software watchdog might be developed that creates an API for components to register that they wish to be required to feed/kick the dog and register a callback for notification of a software watchdog event. A software watchdog is common b/c only one component should be directly interacting with the hardware watchdog so this enables multiple components to participate without conflicting over the hardware watchdog.

This could help with the decision making of whether to start the hardware watchdog or not--hence why I took a chance at posting it here.

martinjaeger commented 3 years ago

@nslowell Just finished to propose a draft implementation of a software watchdog. Please have a look at PR https://github.com/zephyrproject-rtos/zephyr/pull/28947. Happy to receive some feedback about the approach.

zephyrproject-rtos / zephyr

Watchdog handling overhaul #23282