proveskit / fprime-proves

Apache License 2.0
2 stars 5 forks source link

Ping the Hardware Watchdog GPIO Pin #27

Open Mikefly123 opened 1 month ago

Mikefly123 commented 1 month ago

Proposed Features

On Flight Controller V4c, we introduced the radiation tolerant watchdog timer from the OreSat C3 Flight Computer. The circuit anticipates a pulse on the RP2040's WDT_WDI (Watch Dog Timer Watch Dog In) pin (aka GPIO21) every 24 seconds (54 seconds on cold start). If no pulse is received in that window, it will trigger a reset on the RP2040's RESET line.

We need an F' component that is able to do the following:

Design Notes

This is one of the most important components of the entire flight software stack because it is the heartbeat that tells us everything is okay with the hardware and software onboard the Flight Controller. If that heartbeat stops then the Watch Dog will go its job of attempting a hard reset of the system to clear whatever fault might have caused the satellite to freeze. This is the closest we get to being able to "turn it off and back on again" in outer space.

A key thing to note is that you don't want this action to be able to bypass potential software hangups. i.e. do not use an interrupt to trigger the watchdog pet function. Because the window is so generous, we should allow petting to slip out of real time and only enforce an interrupt and pet if we know that there is some higher priority function that must not be interrupted for any reason (like a software update or something).

Obviously if the software completely stops working then a reset will occur, but sometimes soft errors (errors which cause software functions to fail but the overall state machine is still able to continue) can compound and still cause a loss of mission despite there still being enough software alive to keep the satellite barely running. It may be prudent to add some kind of functionality to the Watch Dog component that will purposefully allow the petting window to elapse and a reset to occur if too many soft errors are piling up or something similar is causing issues in the flight software.  

Reference CircuitPython Implementation

Currently the CircuitPython Flight Software inits the pin in pysquared.py:

class Satellite:
...
    def __init__(self):
        ...
        self.watchdog_pin = digitalio.DigitalInOut(board.WDT_WDI)
        self.watchdog_pin.direction = digitalio.Direction.OUTPUT
        self.watchdog_pin.value = False
        ...

Then sets up a function for the 100ms pet as part of the PySquared class:

class Satellite:
...
    def watchdog_pet(self):
        self.watchdog_pin.value = True
        time.sleep(0.1)
        self.watchdog_pin.value = False
    ...

In main.py we setup an async function for petting the Watch Dog in normal operations:

from pysquared import cubsat as c
...
def normal_power_operations():
    ...
    async def check_watchdog():

        c.hardware["WDT"] = True
        while check_power():
            c.watchdog_pet()
            await asyncio.sleep(5)
        c.hardware["WDT"] = False
    ...

In critical or minimum power operations (where things happen in a linear and finite fashion) we directly call the watchdog_pet() function:

def minimum_power_operations():

    c.watchdog_pet()
    f.beacon()
    f.listen()
    f.state_of_health()
    f.listen()
    c.watchdog_pet()

    f.Short_Hybernate()

No logging is implemented yet and will come in a future update.

Required Development Hardware

In order to test this on hardware you will need a following:

You could also change the GPIO pin to be one of the onboard LED's to try and just test your pinging logic is working without needed to mess with the watchdog circuit itself.

Enabling the Watchdog Timer

On the V4c InspireFly Special, the JP2 Solder Jumper must be connected to give the Watch Dog circuit access to the RP2040's reset line. Additionally, the bias voltage can be supplied to J11 STEMMA connector or J10 Pin Headers.

image

Reference Schematic

image

nateinaction commented 1 month ago

Thank you for this! I was wondering why the watchdog wasn't causing any resets on my board. This explains a lot!