It turns out at least one other person has thought about how to do hardware watchdogs on Linux and there's a robust set of tooling surrounding them. We should make use of these tools, replacing our own implementation. As far as I can tell there's for parts to the next iteration: a kernel driver, userspace tooling, systemd service specific configuration, and Oresat documentation/testing.
Kernel
There's a watchdog driver class. Since our watchdog is GPIO driven we probably want gpio-wdt.
This is probably set up though device tree? So we'll want to configure it there for the C3
Is there a software/simulated driver that we can run as part of a VM for testing?
At least two driver based watchdogs already exist on the C3, watchdog0, and watchdog1. What are they and how can they be incorporated?
Userspace
systemd has PID 1 support for hardware watchdogs and a general scheme for system wide watchdogs. See this blogpost for an introduction and man systemd-system.conf for specific knobs to turn.
There's also userspace tooling like wdctl. Are there more tools?
watchdog as mentioned in the blog is also a thing. Do we want it? It monitors system health. Do we care about the things it monitors? Are there other system health indicators that we would care about?
Are there other userspace things or tools that I've missed?
Services
As mentioned in the above blog, services have software watchdog support. See the manpages for systemd.service, systemd.exec, sd_watchdog_enabled and sd_event_set_watchdog (are there others too?).
Our mission critical services should be covered by this. This is at least oresatd, and uhf and lband.
Since those are python/rust are there sd_* bindings for those languages? What libraries and how would they be integrated.
Are there other services that should be covered by watchdogs?
Oresat
There's a lot of moving parts here and so it'd be great if there was an architecture guide and a users guide on how all of this fits together.
On the architecture side which pieces exist, how they fit together, the rational behind them.
On the users guide side, what tools are available to configure, poke at, or disable the watchdog, what would I need to set up a new service covered by a watchdog, how would I verify that I succeeded.
Some kind of test environment or VM, probably using a software or simulated watchdog
Some kind of manual test plan or automated tests to verify that the watchdogs function the way we expect.
It turns out at least one other person has thought about how to do hardware watchdogs on Linux and there's a robust set of tooling surrounding them. We should make use of these tools, replacing our own implementation. As far as I can tell there's for parts to the next iteration: a kernel driver, userspace tooling, systemd service specific configuration, and Oresat documentation/testing.
gpio-wdt
.watchdog0
, andwatchdog1
. What are they and how can they be incorporated?systemd
has PID 1 support for hardware watchdogs and a general scheme for system wide watchdogs. See this blogpost for an introduction andman systemd-system.conf
for specific knobs to turn.wdctl
. Are there more tools?watchdog
as mentioned in the blog is also a thing. Do we want it? It monitors system health. Do we care about the things it monitors? Are there other system health indicators that we would care about?systemd.service
,systemd.exec
,sd_watchdog_enabled
andsd_event_set_watchdog
(are there others too?).oresatd
, anduhf
andlband
.python
/rust
are theresd_*
bindings for those languages? What libraries and how would they be integrated.