projecthorus / radiosonde_auto_rx

Automatically Track Radiosonde Launches using RTLSDR
GNU General Public License v3.0
498 stars 125 forks source link

Docker on Pi stops scanning after a while #587

Open mdroberts1243 opened 2 years ago

mdroberts1243 commented 2 years ago

Lately I've noticed my station stops scanning after hours or days, leaving the web interface running but no scanning task update or refresh happening. This is particularly a problem when I'm away from the house for several days and unable to reset the Pi manually

  1. How could I debug this issue?
  2. Is there a cron job or something that I could just forcibly reset/restart the docker image in the early hours of the morning when no balloons are expected?

Thanks, -mark.

darksidelemm commented 2 years ago

You'd need to look in detail at the system-level log to look for when this occurs. Hopefully an obvious error message is shown!

There's information on viewing the logs from within docker here: https://github.com/projecthorus/radiosonde_auto_rx/wiki/Docker#25-viewing-the-containers-logs

mdroberts1243 commented 2 years ago

Thanks Mark,

I was able to get to the station and pulled the logs with the scanning failure event... attached to this comment. It seems like rtl_power throws an error, seeing a weird device, then after resetting and retrying no devices are available. I've tried unplugging and re-plugging the SDR dongle but the only thing that works is restarting the docker container (no power down of Pi or re-plugging of SDR required).

-mark. stopped_scanning.log

snh commented 2 years ago

It sounds like something is causing the SDR to reset. Generally you'll need to restart the container to re-pickup the SDR in these situations.

Running a docker restart radiosonde_auto_rx using cron daily at a time you aren't expecting sondes is indeed a viable solution.

Something like the following added using sudo crontab -e:

0 4 * * * docker restart radiosonde_auto_rx > /dev/null

This example performs the restart at 4am in whatever timezone your system is set to.

darksidelemm commented 2 years ago

It seems to start with: {"log":"2021-11-16 10:02:48,340 CRITICAL:Scanner #00000002 - rtl_power call failed with return code 1.\n","stream":"stderr","time":"2021-11-16T10:02:48.341539039Z"} {"log":"2021-11-16 10:02:48,341 CRITICAL:Scanner #00000002 - rtl_power reported error: Found 1 device(s):\n","stream":"stderr","time":"2021-11-16T10:02:48.342131744Z"} {"log":" 0: , , SN: \n","stream":"stderr","time":"2021-11-16T10:02:48.342209661Z"} {"log":"\n","stream":"stderr","time":"2021-11-16T10:02:48.342243775Z"} {"log":"No matching devices found.\n","stream":"stderr","time":"2021-11-16T10:02:48.342275598Z"} {"log":"Number of frequency hops: 3\n","stream":"stderr","time":"2021-11-16T10:02:48.3423069Z"} {"log":"Dongle bandwidth: 2166666Hz\n","stream":"stderr","time":"2021-11-16T10:02:48.342384399Z"} {"log":"Downsampling by: 1x\n","stream":"stderr","time":"2021-11-16T10:02:48.34241466Z"} {"log":"Cropping by: 20.00%\n","stream":"stderr","time":"2021-11-16T10:02:48.342443774Z"} {"log":"Total FFT bins: 12288\n","stream":"stderr","time":"2021-11-16T10:02:48.342472316Z"} {"log":"Logged FFT bins: 9830\n","stream":"stderr","time":"2021-11-16T10:02:48.342500753Z"} {"log":"FFT bin size: 528.97Hz\n","stream":"stderr","time":"2021-11-16T10:02:48.34252919Z"} {"log":"Buffer size: 16384 bytes (3.78ms)\n","stream":"stderr","time":"2021-11-16T10:02:48.342557888Z"} {"log":"Reporting every 20 seconds\n","stream":"stderr","time":"2021-11-16T10:02:48.342586482Z"} {"log":"\n","stream":"stderr","time":"2021-11-16T10:02:48.342614502Z"} {"log":"2021-11-16 10:02:48,342 WARNING:Scanner #00000002 - RTLSDR produced no output... resetting and retrying.\n","stream":"stderr","time":"2021-11-16T10:02:48.342862053Z"} {"log":"2021-11-16 10:02:48,434 ERROR:RTLSDR - Could not find RTLSDR with serial 00000002!\n","stream":"stderr","time":"2021-11-16T10:02:48.435494904Z"}

So it looks like the RTLSDR has dropped off the system as you mention. Auto_rx eventually decides that the SDR is dead and drops it from its list, which should result in auto_rx quitting. However I think one of the threads is hanging around, not letting auto_rx quit...

The fact the SDR is dying after a while is probably the bigger issue. Usually a USB reset kicks the RTLSDRs back into life, but perhaps not in this case.

mdroberts1243 commented 2 years ago

Thanks guys. I put a crontab entry in place temporarily. I may need to buy another RTL-SDR, with the Bias T and TXCO eventually. Is it possible to have the docker container restart itself if all the SDRs have dissappeared?

darksidelemm commented 2 years ago

If you're not running a SDR with a TCXO - please get one that does have one. The non-TCXO dongles drift badly in frequency during operation, which can result in issues. Note that the auto_rx instructions specify a TCXO dongle as mandatory... but there's no way that I can check for this.

mdroberts1243 commented 2 years ago

Definitely have a TCXO and an LNA out at the antenna as well.

Is it possible to have a docker container restart itself? I could replace the time out quitting with a restart possibly.

snh commented 2 years ago

The Docker container will restart itself automatically if you have the appropriate Docker restart policy applied, but only if the auto_rx Python application running inside exits, so it sounds like we'll need to figure out why that is the case for this to work.

kng commented 1 year ago

Having a similar problem. The SDR has been working ok on other duties, but now suddenly overheats. This looks to be because of running a too high sample rate. rtl_power doesn't seem to have an option for sample rate, rather it tries to match the available rates and figure out how many hops it can run these with. If this is the case, then selecting the scan range carefully can avoid running a too high rate ?

2023-04-20 20:13:45,557 INFO:Scanner (RTLSDR 666) - Running frequency scan.
2023-04-20 20:14:45,572 CRITICAL:Scanner (RTLSDR 666) - rtl_power call failed with return code 137.
2023-04-20 20:14:45,573 CRITICAL:Scanner (RTLSDR 666) - rtl_power reported error: Found 1 device(s):
  0:  Realtek, RTL2838UHIDIR, SN: 666
Using device 0: Generic RTL2832U OEM
Number of frequency hops: 3
Dongle bandwidth: 2644444Hz
Downsampling by: 1x
Cropping by: 25.00%
Total FFT bins: 12288
Logged FFT bins: 9216
FFT bin size: 645.62Hz
Buffer size: 16384 bytes (3.10ms)
Reporting every 20 seconds
Found Rafael Micro R820T tuner
Tuner gain set to 3.70 dB.
Exact sample rate is: 2644444.138932 Hz
[R82XX] PLL not locked!
Signal caught, finishing scan pass.
Killed
2023-04-20 20:14:45,574 WARNING:Scanner (RTLSDR 666) - SDR produced no output... resetting and retrying.
2023-04-20 20:14:45,709 INFO:RTLSDR - Attempting to reset: /dev/bus/usb/001/011