greatscottgadgets / hackrf

low cost software radio platform
https://greatscottgadgets.com/hackrf/
GNU General Public License v2.0
6.45k stars 1.51k forks source link

Investigate watchdog timer to work around firmware crashes #497

Open KyleKotowick opened 6 years ago

KyleKotowick commented 6 years ago

HackRF One is recognized and works properly. It's in a system that runs 24/7 to record traffic on a radio network. It works fine, until once every 2 weeks or so it crashes. After it crashes, the "hackrf_info" command (on Ubuntu) says "No HackRF boards found" and the device doesn't show up with the "lsusb" command. I can resolve this issue by pressing the "Reset" button or by unplugging it and reconnecting.

The problem is that this device is in a remote location where I cannot physically interact with it easily. I've looked for ways to programmatically cut the power to the USB port to force a hard reset, but that doesn't seem possible with modern Linux. So, I need the device to either (a) stop crashing, (b) reset itself after it crashes, or (c) allow me to reset it via software.

Steps to reproduce

  1. Use the device 24/7 for several weeks
  2. Wait for it to crash and stop working

Expected behaviour

It should not crash, or should be recoverable from the crash without having to physically interact with it.

Actual behaviour

I have to press the reset button or unplug it to reset it.

Version information

Operating system: Ubuntu 16.04

hackrf_info output: "No HackRF boards found"

┆Issue is synchronized with this Basecamp todo by Unito

barto- commented 6 years ago

@kkotowick If you are familiar with the topic, you could use the LPC43XX watchdog timer.

dominicgs commented 6 years ago

I don't know what's causing this, but I would guess that it's related to something getting out of whack on the device side USB stack. You can reset the device over USB, but only if it's responding over USB.

I was under the impression that you could do reset a port in a Linux environment using this method: https://marc.info/?l=linux-usb&m=121459435621262&w=2

Does that no longer work?

KyleKotowick commented 6 years ago

@dominicgs It's not responding over USB, hence the inability to reset it over USB. It's causing quite the conundrum. I have already tried the software you linked, to no avail. It seems that it resets the data connection to the USB device, but not the power connection, so the HackRF is still stuck in it's crashed/hanged state.

dominicgs commented 6 years ago

Are the crashes always after about two weeks, or is it more random than that?

I think @barto- is probably on the right track, we could use the watchdog timer to reset the device if we're still running but the USB connection has died.

barto- commented 6 years ago

@dominicgs using the watchdog is the only effective workaround I could find. @kkotowick You could use something like this. file firmware/hackrf_usb/hackrf_usb.c

...
#include <libopencm3/lpc43xx/wwdt.h>
...
int main(void) {
...
  // Watchdog timer settings
  WWDT_TC = 750000; // 0.25 seconds
  WWDT_MOD = 0x0003; // Enable watchdog and enable reset on timeout

  while(true){
    // Reload WWDT register and prevent reset.
    WWDT_FEED = 0xAA;
    WWDT_FEED = 0x55;
...
  }
...
}

The code above resets the micro controller if the WWDT_FEED register doesn't get reloaded within 0.25 seconds. You can change the duration of this interval by setting the WWDT_TC register according to this formula:

Duration = WWDT_TC / 12000000 * 4

BE AWARE THAT THIS MODIFICATION WILL BREAK OTHER FEATURES, SUCH AS FIRMWARE/CPLD UPGRADE. Therefore, you must know what you are doing in order to avoid problems.

For additional information check the LPC43xx User Manual

dominicgs commented 6 years ago

@barto- I think my plan to work around those other cases would be to only enable it when the radio is in RX/TX/sweep modes.

The alternative approach that I was thinking of was to keep track of the time since the last USB transaction with one of the regular timers, which would help us to support USB suspend.