pothosware / SoapySDRPlay2

Soapy SDR plugin for SDRPlay
MIT License
51 stars 11 forks source link

Segmentation fault with trunk-recorder on Debian #59

Open dlaw opened 4 years ago

dlaw commented 4 years ago

I'm getting a segfault from trunk-recorder:

~snip~
[2019-11-02 18:13:03.580708] (info)   Driver: osmosdr
[2019-11-02 18:13:03.580738] (info)   Center: 8.555000e+08
[2019-11-02 18:13:03.580805] (info)   Rate: 10000000
[2019-11-02 18:13:03.580849] (info)   Error: 500
[2019-11-02 18:13:03.580885] (info)   PPM Error: 0
[2019-11-02 18:13:03.580914] (info)   Gain: 20
[2019-11-02 18:13:03.580950] (info)   IF Gain: 0
[2019-11-02 18:13:03.580979] (info)   BB Gain: 0
[2019-11-02 18:13:03.581009] (info)   LNA Gain: 0
[2019-11-02 18:13:03.581044] (info)   PGA Gain: 0
[2019-11-02 18:13:03.581082] (info)   TIA Gain: 0
[2019-11-02 18:13:03.581115] (info)   MIX Gain: 0
[2019-11-02 18:13:03.581147] (info)   VGA1 Gain: 0
[2019-11-02 18:13:03.581180] (info)   VGA2 Gain: 0
[2019-11-02 18:13:03.581210] (info)   Squelch: 0
[2019-11-02 18:13:03.581257] (info)   Idle Silence: false
[2019-11-02 18:13:03.581287] (info)   Digital Recorders: 3
[2019-11-02 18:13:03.581322] (info)   Debug Recorders: 0
[2019-11-02 18:13:03.581360] (info)   SigMF Recorders: 0
[2019-11-02 18:13:03.581391] (info)   Analog Recorders: 0
[2019-11-02 18:13:03.581426] (info)   Modulation: qpsk
[2019-11-02 18:13:03.581468] (info)   Source Device: soapy=0,driver=sdrplay
gr-osmosdr v0.1.4-127-g4d83c606 (0.1.5git) gnuradio 3.7.13.4
built-in source types: file fcd rtl rtl_tcp uhd hackrf rfspace soapy redpitaya 
[INFO] Using format CF32.
[2019-11-02 18:13:03.989391] (info)   SOURCE TYPE OSMOSDR (osmosdr)
[2019-11-02 18:13:03.989568] (info)   Setting sample rate to: 10000000
[2019-11-02 18:13:03.989851] (info)   Actual sample rate: 10000000
[2019-11-02 18:13:03.990037] (info)   Tunning to 8.555005e+08
[2019-11-02 18:13:03.990996] (info)   Gain Stage: IFGR supported values: 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 
[2019-11-02 18:13:03.991232] (info)   Gain Stage: RFGR supported values: 0 1 2 3 4 5 6 7 8 9 
[2019-11-02 18:13:03.991290] (info)   Max Freqency: 8.605000e+08
[2019-11-02 18:13:03.991365] (info)   Min Freqency: 8.505000e+08
[2019-11-02 18:13:03.991460] (info)   LNA Gain set to: 0
[2019-11-02 18:13:03.991527] (info)   TIA Gain set to: 0
[2019-11-02 18:13:03.991588] (info)   PGA Gain set to: 0
[2019-11-02 18:13:03.991659] (info)   Gain set to: 20
[2019-11-02 18:13:03.991747] (info)   Setting antenna to [RX]
~snip~
[2019-11-02 19:03:39.872087] (info)     - Starting P25 Recorder Num [0] TG: 1059    Freq: 8.523000e+08  TDMA: false Slot: 0
[2019-11-02 19:03:39.873122] (info)   [comirs]  TG: 1059    Freq: 8.523000e+08  Starting Recorder on Src: soapy=0,driver=sdrplay
[2019-11-02 19:03:44.910787] (info)   [comirs]  TG: 1059    Freq: 8.523000e+08  Update Retuning - New Freq: 8.511250e+08    Elapsed: 5s     Since update: 2s
[2019-11-02 19:03:49.887167] (info)     - Starting P25 Recorder Num [1] TG: 1109    Freq: 8.516250e+08  TDMA: false Slot: 0
[2019-11-02 19:03:49.887714] (info)   [comirs]  TG: 1109    Freq: 8.516250e+08  Starting Recorder on Src: soapy=0,driver=sdrplay
[2019-11-02 19:03:52.083655] (info)   [comirs]  TG: 1059    Freq: 8.511250e+08  Update Retuning - New Freq: 8.563625e+08    Elapsed: 13s    Since update: 2s
[2019-11-02 19:03:53.729387] (info)   [comirs]  TG: 1109    Freq: 8.516250e+08  Update Retuning - New Freq: 8.537875e+08    Elapsed: 4s     Since update: 1s
[2019-11-02 19:03:57.224294] (info)     - Starting P25 Recorder Num [2] TG: 1227    Freq: 8.526875e+08  TDMA: false Slot: 0
[2019-11-02 19:03:57.224887] (info)   [comirs]  TG: 1227    Freq: 8.526875e+08  Starting Recorder on Src: soapy=0,driver=sdrplay
Segmentation fault (core dumped)

I poked around the backtrace and it appears to point to SoapySDRPlay:

(gdb) bt
#0  __memmove_sse2_unaligned_erms () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:262
#1  0x00007f38ba5cc366 in SoapySDRPlay::readStream(SoapySDR::Stream*, void* const*, unsigned long, int&, long long&, long) ()
   from /usr/local/lib/SoapySDR/modules0.8/libsdrPlaySupport.so
#2  0x00007f38be5d9c98 in soapy_source_c::work(int, std::vector<void const*, std::allocator<void const*> >&, std::vector<void*, std::allocator<void*> >&) ()
   from /usr/local/lib/libgnuradio-osmosdr.so.0.1.5git
#3  0x00007f38bfbca6f7 in gr::sync_block::general_work(int, std::vector<int, std::allocator<int> >&, std::vector<void const*, std::allocator<void const*> >&, std::vector<void*, std::allocator<void*> >&) () from /lib/x86_64-linux-gnu/libgnuradio-runtime.so.3.7.13
#4  0x00007f38bfb900a0 in gr::block_executor::run_one_iteration() () from /lib/x86_64-linux-gnu/libgnuradio-runtime.so.3.7.13
#5  0x00007f38bfbd3b29 in gr::tpb_thread_body::tpb_thread_body(boost::shared_ptr<gr::block>, boost::shared_ptr<boost::barrier>, int) ()
   from /lib/x86_64-linux-gnu/libgnuradio-runtime.so.3.7.13
#6  0x00007f38bfbc5ec4 in ?? () from /lib/x86_64-linux-gnu/libgnuradio-runtime.so.3.7.13
#7  0x00007f38bfb7b342 in ?? () from /lib/x86_64-linux-gnu/libgnuradio-runtime.so.3.7.13
#8  0x00007f38be549f65 in ?? () from /lib/x86_64-linux-gnu/libboost_thread.so.1.67.0
#9  0x00007f38be51bfa3 in start_thread (arg=<optimized out>) at pthread_create.c:486
#10 0x00007f38bdcca4cf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

I have the 600 megabyte core dump file if anyone would like to examine it in more detail. (Or let me know what else you would like to see... happy to run any gdb commands to poke at the core dump on request.)

For what it's worth, I've reproduced this issue five times so far (and I have core dumps for four of those times... they all look the same to me). I have only been able to reproduce the issue when the radio system is busy and trunk-recorder is using 3 or more recorder threads. I've run for quite a long time with 2 recorder threads with no segfaults; when I switch back to 3 recorder threads, a segfault comes along within an hour or so (whenever the radio system gets busy).

Version information:

I am not sure whether to file this bug report under SoapySDRPlay, gr-osmosdr, or trunk-recorder... based on the backtrace, I decided to start with SoapySDRPlay.

Please let me know if there's any additional testing I can do, or any additional information I can provide.

SDRplay commented 4 years ago

Something is very confusing, the debug at the bottom...

1 0x00007f38ba5cc366 in SoapySDRPlay::readStream(SoapySDR::Stream, void const*, unsigned long, int&, long long&, long) ()

from /usr/local/lib/SoapySDR/modules0.8/libsdrPlaySupport.so

2 0x00007f38be5d9c98 in soapy_source_c::work(int, std::vector<void const, std::allocator<void const> >&, std::vector<void, std::allocator<void> >&) ()

from /usr/local/lib/libgnuradio-osmosdr.so.0.1.5git

how can that be the case - why is it trying to use both?? Unless I'm misreading this? Can you not just get this to point to SoapySDR/SoapySDRPlay?

dlaw commented 4 years ago

osmosdr supports SoapySDR as a source... so any radio that SoapySDR supports can be used with osmosdr acting as a "pass-through". The data flow is SDRPlay -> SoapySDR -> osmosdr -> trunk-recorder. (Unfortunately trunk-recorder does not have native SoapySDR support.)

Here is the osmosdr implementation: http://git.osmocom.org/gr-osmosdr/tree/lib/soapy/soapy_source_c.cc

SDRplay commented 4 years ago

Set the sample rate to 2 MHz and see if the problem still exists.

dlaw commented 4 years ago

Unfortunately I won't be able to reproduce the problem with 2 MHz of bandwidth, as that is not wide enough for me to activate 3 or more recorder threads in trunk-recorder. I am recompiling everything with debugging symbols now in order to gather some more information.

dlaw commented 4 years ago
(gdb) frame 1
#1  0x00007f74f750881e in SoapySDRPlay::readStream (this=0x5632e4750c60, stream=0x5632e4750c60, buffs=0x7f74e00010e0, numElems=4096, 
    flags=@0x7f74ef1b8334: 0, timeNs=@0x7f74ef1b8338: 0, timeoutUs=100000) at /home/dlaw/SoapySDRPlay/Streaming.cpp:315
315         std::memcpy(buff0, (float *)_currentBuff, returnedElems * 2 * sizeof(float));
(gdb) p/x $_siginfo
$1 = {si_signo = 0xb, si_errno = 0x0, si_code = 0x1, _sifields = {_pad = {0xf44ccac0, 0x7f74, 0x0 <repeats 26 times>}, _kill = {si_pid = 0xf44ccac0, 
      si_uid = 0x7f74}, _timer = {si_tid = 0xf44ccac0, si_overrun = 0x7f74, si_sigval = {sival_int = 0x0, sival_ptr = 0x0}}, _rt = {si_pid = 0xf44ccac0, 
      si_uid = 0x7f74, si_sigval = {sival_int = 0x0, sival_ptr = 0x0}}, _sigchld = {si_pid = 0xf44ccac0, si_uid = 0x7f74, si_status = 0x0, si_utime = 0x0, 
      si_stime = 0x0}, _sigfault = {si_addr = 0x7f74f44ccac0, _addr_lsb = 0x0, _addr_bnd = {_lower = 0x0, _upper = 0x0}}, _sigpoll = {
      si_band = 0x7f74f44ccac0, si_fd = 0x0}}}
(gdb) info frame
Stack level 1, frame at 0x7f74ef1b8320:
 rip = 0x7f74f750881e in SoapySDRPlay::readStream (/home/dlaw/SoapySDRPlay/Streaming.cpp:315); saved rip = 0x7f74fb562c98
 called by frame at 0x7f74ef1b8350, caller of frame at 0x7f74ef1b8290
 source language c++.
 Arglist at 0x7f74ef1b8310, args: this=0x5632e4750c60, stream=0x5632e4750c60, buffs=0x7f74e00010e0, numElems=4096, flags=@0x7f74ef1b8334: 0, 
    timeNs=@0x7f74ef1b8338: 0, timeoutUs=100000
 Locals at 0x7f74ef1b8310, Previous frame's sp is 0x7f74ef1b8320
 Saved registers:
  rbx at 0x7f74ef1b8308, rbp at 0x7f74ef1b8310, rip at 0x7f74ef1b8318
(gdb) info locals
buff0 = 0x7f74f42e0ab0
returnedElems = 4096
(gdb) print _currentBuff
$2 = (short *) 0x7f74f44ccac0

The siginfo si_addr is equal to _currentBuff, i.e. _currentBuff is set to an invalid address and SoapySDRPlay segfaults when we attempt to copy from that address into buff0.

guruofquality commented 4 years ago

Attempt 2, I stared at this for a while and the buffer increment stuff in readstream looks fine. I'm a little suspicious of the buffer accounting in rx_callback:

1) This code releases a buffer to the reader thread (aka readStream)

if ((_buffs[_buf_tail].size() + spaceReqd) >= (bufferLength / decM))
    {

2) The next section starts to fill the next tail buffer, which using a resize. Resize could cause an allocation, which is something i'm a little weary about.

// get current fill buffer
    auto &buff = _buffs[_buf_tail];
    buff.resize(buff.size() + spaceReqd);

But I'm not sure the _buf_count is safe. We check that we have an available write buffer before 1) with if (_buf_count == numBuffers) but for section 2) this may not be the case if we just incremented _buf_count.

So now the current tail buffer could be a buffer already given to the read thread, which in turn will now get resized and re-allocated larger than we expected to a new pointer because its size didnt get cleared because it was not yet released by the read thread. -- just a guess, but it seems like a buffer accounting issue.

dlaw commented 4 years ago

I just got a segfault with trunk-recorder running 2 recorder threads after about 30 hours of continuous operation. Normally, with 3 recorder threads, I get a segfault after somewhere between 10 seconds and 1 hour depending on how busy the receive threads are. The segfault always seems to happen when all available recorders are recording. My guess right now would be that the segfault is correlated with peak CPU usage, which makes sense if it is a race condition between the read and write threads.

sciurus commented 4 years ago

I can reproduce this issue.

SoapySDRPlay: ccc0746d122b91c520ff377b5978d9a45e77feff trunk-recorder: fd9553a6716a2a86a6ef2eef2b82433aa831f043 SDRPlay RSP API: 2.13.1

All other software is from Debian 10.4 x86-64.

So far it seems that if I set "digitalRecorders": 1 I do not see a crash, but with any higher values I'll get a segmentation fault within a couple minutes. I've tested with 6 and 10 MHz sample rates.

For the core file, a gdb backtrace shows

(gdb) bt
#0  __memmove_sse2_unaligned_erms () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:262
#1  0x00007f50e74c7ed4 in SoapySDRPlay::readStream(SoapySDR::Stream*, void* const*, unsigned long, int&, long long&, long) () from /usr/local/lib/SoapySDR/modules0.6/libsdrPlaySupport.so
#2  0x00007f50ef159898 in ?? () from /lib/x86_64-linux-gnu/libgnuradio-osmosdr.so.0.1.4
#3  0x00007f50f03286f7 in gr::sync_block::general_work(int, std::vector<int, std::allocator<int> >&, std::vector<void const*, std::allocator<void const*> >&, std::vector<void*, std::allocator<void*> >&) ()
   from /lib/x86_64-linux-gnu/libgnuradio-runtime.so.3.7.13
#4  0x00007f50f02ee0a0 in gr::block_executor::run_one_iteration() () from /lib/x86_64-linux-gnu/libgnuradio-runtime.so.3.7.13
#5  0x00007f50f0331b29 in gr::tpb_thread_body::tpb_thread_body(boost::shared_ptr<gr::block>, boost::shared_ptr<boost::barrier>, int) () from /lib/x86_64-linux-gnu/libgnuradio-runtime.so.3.7.13
#6  0x00007f50f0323ec4 in ?? () from /lib/x86_64-linux-gnu/libgnuradio-runtime.so.3.7.13
#7  0x00007f50f02d9342 in ?? () from /lib/x86_64-linux-gnu/libgnuradio-runtime.so.3.7.13
#8  0x00007f50f0793615 in ?? () from /lib/x86_64-linux-gnu/libboost_thread.so.1.67.0
#9  0x00007f50f0765fa3 in start_thread (arg=<optimized out>) at pthread_create.c:486
#10 0x00007f50eebdf4cf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
joshuarubin commented 4 years ago

I'm getting a similar segfault using linhpsdr (on Arch): https://github.com/g0orx/linhpsdr/pull/84

gvanem commented 3 years ago

I also have this issue on Windows (x86, MSVC). I get a crash in both GQrx and Quisk using SoapSDR. Here is the calll-stack when using SDRPlay in Quisk:

VCRUNTIME140!memcpy(unsigned char * dst = 0x1ae9a170 "???", unsigned char * src = 0x6efc40c4 "@`???", 
unsigned long count = 0x80e8)+0x4e
SdrPlaySupport!SoapySDRPlay::readStream(class SoapySDR::Stream * stream = 0x1ae9a170,
void ** buffs = 0x6efc40c4, unsigned int numElems = 0x80e8, int * flags = 0x2294f2fc, int64 * timeNs = 0x2294f300, 
long timeoutUs = 0n30000)+0xf5
SoapySDR!SoapySDRDevice_readStream(struct SoapySDRDevice * device = 0x1ae9a170, struct SoapySDRStream * stream = 0x1ae9a170, 
void ** buffs = 0x6efc40c4, unsigned int numElems = 0x80e8, int * flags = 0x2294f2fc, int64 * timeNs = 0x2294f300, 
long timeoutUs = 0n30000)+0x5d
soapy+0x265a
python36!_PyCFunction_FastCallKeywords(struct _object * func = 0x03f327b0, struct _object ** stack = 0x04d794a0, int nargs = 0n150000, struct _object * kwnames = 0x00000000)+0x3e
...

Very annoying since I have to unplug the SDRPlay from the USB-port and restart the SDRPlayAPI service to make the radio work again in e.g. SDRUno.

Edit: Oh, and a !analyze -v in WinDbg says APPLICATION_FAULT_INVALID_POINTER_READ for the std::memcpy() in SoapySDRPlay/Streamin.cpp (at line 310).

dlaw commented 3 years ago

For what it's worth, I updated to https://github.com/pothosware/SoapySDRPlay3/ and this issue appears to be fixed. I also pulled and rebuilt trunk-recorder, gr-osmosdr, and all of the soapy components.