Open jamesaisenberg opened 3 years ago
Could you post your json config in a code block? We can begin from there after. :-)
Sure! Here it is:
{
"ver": 2,
"sources": [
{
"center": 855000000.0,
"rate": 10000000.0,
"error": 0,
"ppm": 0,
"agc": false,
"gain": 14,
"ifGain": 16,
"bbGain": 32,
"analogRecorders": 24,
"digitalRecorders": 12,
"driver": "osmosdr",
"device": "bladerf,fpga=/app/fpga.rbf"
}
],
"systems": [{
"control_channels": [853175000,853675000,854362500,851412500],
"type": "smartnet",
"bandplan": "800_reband",
"talkgroupsFile": "talkgroups.csv",
"talkgroupDisplayFormat": "id_tag",
"shortName": "kcers3",
"recordUnknown": true,
"apiKey": XXX,
"pyScript": "tester",
"modulation": "fsk4",
"squelch": -49,
"analogLevels": 10,
"digitalLevels": 4
}],
"callTimeout": 5,
"uploadServer": "https://api.openmhz.com",
"defaultMode": "analog"
}
If memory serves right, you should have the ability to console-into/query the BladeRF. If so, mind posting the following command results?
info
- remove the serial value.print clock_sel
version
print hardware
With print hardware
I'm more curious to know your temperature reported. You may have frequency drift happening, resulting in the given issue you're seeing.
Keep in mind, the BladeRF has a built-in TCXO on the board. There is a chance it's malfunctioning or has a calibration error. I think it's within a 1 ppm error range, any more than that - there's an issue.
Here it is. The only reason why I'm skeptical of a thermal issue is because stopping and restarting trunk-recorder immediately fixes the problem, and the device should be pretty much at the same temperature before vs after.
bladeRF> info
Board: Nuand bladeRF 2.0 (bladerf2)
Serial #: XXX
VCTCXO DAC calibration: 0x1c43
FPGA size: 49 KLE
FPGA loaded: yes
Flash size: 32 Mbit
USB bus: 2
USB address: 3
USB speed: SuperSpeed
Backend: libusb
Instance: 0
bladeRF> print clock_sel
Clock input: Onboard VCTCXO
bladeRF> version
bladeRF-cli version: 1.8.0-git-c8320f71-ppafocal
libbladeRF version: 2.2.1-git-c8320f71-ppafocal
Firmware version: 2.3.1
FPGA version: 0.11.0 (configured by USB host)
bladeRF> print hardware
Hardware status:
RFIC status:
Tuning Mode: Host
Temperature: 30.7 degrees C
CTRL_OUT: 0xf8 (0x035=0x00, 0x036=0xff)
Power source: USB Bus
Power monitor: 4.824 V, 0.57 A, 2.76 W
RF routing:
RX1: RFIC 0x0 (A_BAL ) <= SW 0x0 (OPEN )
RX2: RFIC 0x0 (A_BAL ) <= SW 0x0 (OPEN )
TX1: RFIC 0x0 (TXA ) => SW 0x0 (OPEN )
TX2: RFIC 0x0 (TXA ) => SW 0x0 (OPEN )
Hmmm, looking at the configuration details, things look to be correct. You may consider updating the firmware/data on the blade; https://github.com/Nuand/bladeRF/releases
Is that temperature with the BladeRF at an idle state, or did you kill the recorder process then run the command?
I'm inclined to say it's a clock issue due to thermal drift due to the PLL Phase segment in the TrunkRecorder console log.
Also, I am not sure how the docker image pulls the TrunkRecorder stuff - so not sure which git revision that you're on. You're running on the correct USB mode for the Blade, so that's good (ample bandwidth).
So, I maybe remembering things wrong, but there could be some lurking issue with the Smartnet trunking channel decoder. I think there is something in the decode graph that can just get stuck. There is a horrible hack that works... and I use.
You can try adding "controlRetuneLimit": 10,
to the high-level settings in the config.json. If it fails to successfully retune the control channel after 10 tries, it will exit.
If you use a script like this: https://github.com/robotastic/trunk-recorder/blob/master/examples/auto-restart.sh You can then just auto-restart the instance.
This is of course sub-optimal. I would love to hunt down the actual cause. I think it may just be that the SmartNet decoder is more sensitive to slight tuning errors. It might be worth trying to add +/-100Mhz to the error setting. It is far fetched, but maybe try a different bandwidth? It could also be that some of the filter parameters do not work cleanly at 10MHz, might be worth trying 12MHz or 16MHz.
I’m having the exact same issue, but with a new setup using two RTL-SDR dongles. Like with OP, it doesn’t seem to be a heat-related drift issue. My RTL-SDRs have TCXOs, and restarting trunk-recorder fixes it right away.
I’m a newbie to all this, and I was originally trying to set this up on a Raspberry Pi 3 B. What I would find is it would work for about 5 minutes then just kind of stop receiving control messages without any rhyme or reason.
I thought it was some misconfiguration on my end, so I spent a whole bunch of time trying different things. I ran rtl_test against the two dongles and even after running for about an hour both were around -1/-2 (in fact, changing ppm in the config to anything other than 0 causes the whole thing to stop getting control messages). I basically have LOS with the tower, and even with a cheap antenna mounted indoors in a very noisy area for RF about 20db is the sweet spot for required gain. The RTL-SDR dongles aren’t even particularly hot when things get stuck.
For the sake of comparison, I tried the exact same setup with exact the same build of trunk-recoder on my Mac and it worked great… for about an hour and then the same thing happened again. Ugh.
Like OP, I am trying to trunk track a SmartNet system — in fact I’m also trunk tracking KCERS (just a different site). When things work, they work great. It’s just weird that things just kinda get stuck after a little while.
I am hoping to use my Raspberry Pi to drive this setup, so the restart workaround isn’t really tenable if it gets stuck every few minutes on that platform. :) If there’s any more information I can collect or any debugging I can do that might help get to a fix for this, I’m happy to help!
Update: after spending a bunch of time tweaking my settings and antenna position, my Pi 3 recorder now runs for a few hours without getting into the stuck state so it's basically parity with my MacOS experience. Still a bummer that it needs the auto-restart back in place, but overall reliability is acceptable.
If there's anything I can do to help get this fixed once and for all, please just let me know what I can do. I am happy to do gdb-based debugging if you can tell me what I should be looking for. :)
Yea - I wish there was a better solution. My bet is that it has something to do with this GNU Radio block somehow getting lost.... it is definitely one of the blocks in this file!
It looks like you might be able to use Symbol Sync instead... but I have no idea how to do that.
Just an update. I'm using the latest builds of trunk-recorder and perhaps most importantly I've switched to a different antenna tuned for 800-900MHz and added a LNA into the chain to help compensate for a suboptimal RF environment.
Since doing all this and tweaking the config a little more, trunk-recorder has managed to run for 4 days straight recording some 10000 calls without a single restart or hiccup. It's continually pulling a 35-45 decode rate and I haven't seen it drop lower than that a single time.
I'm thrilled it's working better now, but as an engineer at heart it's frustrating to not really know what ultimately "fixed" things. Was it something that just happened to be fixed in a newer build of trunk-recoder, gnuradio, or op25? Was having the better antenna and pre-amplifier (and thus ability to drop the SDR gain from 25 down to 7) reducing heat-related frequency drift? (I'm doubtful, as the SDR has a TCXO and because after a restart it would immediately lock back on to the control channel.) Or maybe the better antenna setup compensated for some other internal issue by having a more consistently "good" signal rather than one that would ebb and flow so whatever was triggering this issue was less likely to happen.
I'm unsure, but it seems to be working great for me now. 🤷♂️
I'm unsure, but it seems to be working great for me now. 🤷♂️
Update: seeing this happen again, but it's maybe like once or twice a day. Far better than the a couple times an hour originally. When it restarts, it immediately locks on to the control channel so I'll lose at most maybe a minute or two worth of calls.
Totally - this has been vexing me. One guess is that a questionable control signal somehow locks up some of the GNU Radio blocks. There haven't been any code changes recently that would change things for smartnet decoding.
[2023-04-09 16:16:25.007705] (info) Pll Phase: 3.1173 min Freq: -0.000001 MHz Max Freq: 0.000001 MHz
[2023-04-09 16:16:25.007825] (error) [KCERS] Control Channel Message Decode Rate: 0/sec, count: 0
[2023-04-09 16:16:28.010110] (error) [KCERS] Retuning to Control Channel: 853.450000 MHz
[2023-04-09 16:16:28.010249] (info) - System Source 0 - Min Freq: 847.912000 MHz Max Freq: 857.812000 MHz
Am seeing this issue too on a standalone box, on a HackRF w/ a TCXO and LNA. Every time I stop trunkrecorder and restart it (or look manually via gqrx), everything is okay.
Happy to provide any data/tracing needed.
{
"ver": 2,
"logFile": false,
"logLevel": "info",
"debugRecorder": false,
"audioStreaming": false,
"sources": [{
"center": 852862000.0,
"rate": 10000000.0,
"ppm": 0.0,
"gain": 14,
"ifGain": 24,
"bbGain": 20,
"antenna": "TX/RX",
"analogRecorders": 10,
"driver": "osmosdr",
"device": "hackrf=936e63",
"defaultMode": "analog"
}],
"systems": [{
"audioArchive": true,
"control_channels": [853425000, 851812500, 853450000, 856887500],
"type": "smartnet",
"squelch": -75.0,
"modulation": "fsk4",
"talkgroupsFile": "tg.csv",
"bandplan": "800_reband",
"recordUnknown": false,
"shortName": "KCERS",
"audioArchive": true
}],
"plugins": [
{
"name":"simplestream",
"library":"libsimplestream.so",
"streams":[{
"TGID": 0,
"address": "1.2.3.4",
"port": 9123,
"sendTGID": false
}]
}
]
}
I would love to find a solution for this - I have just been using the auto restarter if it loses the control channel for too long... it is hacky, but works.
@robotastic Oh, didn't know that existed. Can you point me the right direction? (I know there's a shell script in a loop but that covers the process crashing.)
Oh, didn't know that existed. Can you point me the right direction? (I know there's a shell script in a loop but that covers the process crashing.)
There's a "secret" config setting (not that secret, just hasn't made it into the documentation). At the top level of the config.json set something like:
"controlRetuneLimit": 5,
If it can't find a control channel after X attempts (pick a number at least as high as the number of control channels), trunk-recorder will quit, and either a shell script or systemd service can fire it back up.
I wonder if this and #792 are related? Have you tried the fix in the cc-retune branch?
I see this and I only have one control channel set.
There's a "secret" config setting (not that secret, just hasn't made it into the documentation). At the top level of the config.json set something like:
"controlRetuneLimit": 5,
this is a good reminder to get that documented... This is the script I use for auto-restarting: https://github.com/robotastic/trunk-recorder/blob/master/examples/auto-restart.sh
this is a good reminder to get that documented
I can do that if you want. I have an entry for the logDir option (actual feature was added in https://github.com/robotastic/trunk-recorder/commit/5177bcee5012d6189192fe669043f343b0a74d49) and some minor changes to submit. Want to submit it along with an option to log the decoding rates to a file (in place of the controlWarnUpdate idea I proposed). Should have it soon.
Hi all! I just setup trunk-recorder for the first time. I seem to be running into the same or a very similar issue. It happens even more frequently for me--roughly every two minutes. With RTL-SDRs in my case.
[2023-07-28 19:49:05.016578] (error) [aacounty] Retuning to Control Channel: 854.412500 MHz [2023-07-28 19:49:05.016752] (info) - System Source 0 - Min Freq: 854.211875 MHz Max Freq: 856.131875 MHz [2023-07-28 19:49:05.016860] (info) Pll Phase: 3.79648 min Freq: -0.000001 MHz Max Freq: 0.000001 MHz [2023-07-28 19:49:05.016995] (error) [aacounty] Control Channel Message Decode Rate: 0/sec, count: 0
I'm brand new so maybe I'm missing something obvious in my config. I could try to set controlRetuneLimit and have tr terminate/restart, but with how frequently this is happening, I'm thinking I should try to troubleshoot the root cause. Any thoughts on where to check?
{
"ver": 2,
"sources": [
{
"center": 855171875,
"rate": 2048000,
"error": 700,
"gain": 30,
"digitalRecorders": 4,
"analogRecorders": 4,
"driver": "osmosdr",
"device": "rtl=1"
}, {
"center": 856513400,
"rate": 2048000,
"error": 700,
"gain": 30,
"digitalRecorders": 2,
"analogRecorders": 2,
"driver": "osmosdr",
"device": "rtl=2"
}
],
"systems": [{
"control_channels": [854337500,854412500,854812500,855162500],
"type": "smartnet",
"talkgroupsFile": "/home/evan/trunk-recorder/aa-p16.csv",
"modulation": "fsk4",
"squelch": -60,
"analogLevels": 8,
"digitalLevels": 2,
"shortName": "aacounty",
"apiKey": "REDACTED"
}],
"uploadServer": "https://api.openmhz.com"
}
Thanks so much for all the work that's gone into trunk-recorder!
I've been having an issue where trunk-recorder stops recording after a few hours. The error message I am getting is this, repeated every second or so:
Exiting and starting trunk-recorder fixes the problem. I am using the latest docker image, and my hardware is a BladeRF 2 xA40.
Thanks!