Open dlaw opened 4 years ago
Something is very confusing, the debug at the bottom...
from /usr/local/lib/SoapySDR/modules0.8/libsdrPlaySupport.so
from /usr/local/lib/libgnuradio-osmosdr.so.0.1.5git
how can that be the case - why is it trying to use both?? Unless I'm misreading this? Can you not just get this to point to SoapySDR/SoapySDRPlay?
osmosdr supports SoapySDR as a source... so any radio that SoapySDR supports can be used with osmosdr acting as a "pass-through". The data flow is SDRPlay -> SoapySDR -> osmosdr -> trunk-recorder. (Unfortunately trunk-recorder does not have native SoapySDR support.)
Here is the osmosdr implementation: http://git.osmocom.org/gr-osmosdr/tree/lib/soapy/soapy_source_c.cc
Set the sample rate to 2 MHz and see if the problem still exists.
Unfortunately I won't be able to reproduce the problem with 2 MHz of bandwidth, as that is not wide enough for me to activate 3 or more recorder threads in trunk-recorder. I am recompiling everything with debugging symbols now in order to gather some more information.
(gdb) frame 1
#1 0x00007f74f750881e in SoapySDRPlay::readStream (this=0x5632e4750c60, stream=0x5632e4750c60, buffs=0x7f74e00010e0, numElems=4096,
flags=@0x7f74ef1b8334: 0, timeNs=@0x7f74ef1b8338: 0, timeoutUs=100000) at /home/dlaw/SoapySDRPlay/Streaming.cpp:315
315 std::memcpy(buff0, (float *)_currentBuff, returnedElems * 2 * sizeof(float));
(gdb) p/x $_siginfo
$1 = {si_signo = 0xb, si_errno = 0x0, si_code = 0x1, _sifields = {_pad = {0xf44ccac0, 0x7f74, 0x0 <repeats 26 times>}, _kill = {si_pid = 0xf44ccac0,
si_uid = 0x7f74}, _timer = {si_tid = 0xf44ccac0, si_overrun = 0x7f74, si_sigval = {sival_int = 0x0, sival_ptr = 0x0}}, _rt = {si_pid = 0xf44ccac0,
si_uid = 0x7f74, si_sigval = {sival_int = 0x0, sival_ptr = 0x0}}, _sigchld = {si_pid = 0xf44ccac0, si_uid = 0x7f74, si_status = 0x0, si_utime = 0x0,
si_stime = 0x0}, _sigfault = {si_addr = 0x7f74f44ccac0, _addr_lsb = 0x0, _addr_bnd = {_lower = 0x0, _upper = 0x0}}, _sigpoll = {
si_band = 0x7f74f44ccac0, si_fd = 0x0}}}
(gdb) info frame
Stack level 1, frame at 0x7f74ef1b8320:
rip = 0x7f74f750881e in SoapySDRPlay::readStream (/home/dlaw/SoapySDRPlay/Streaming.cpp:315); saved rip = 0x7f74fb562c98
called by frame at 0x7f74ef1b8350, caller of frame at 0x7f74ef1b8290
source language c++.
Arglist at 0x7f74ef1b8310, args: this=0x5632e4750c60, stream=0x5632e4750c60, buffs=0x7f74e00010e0, numElems=4096, flags=@0x7f74ef1b8334: 0,
timeNs=@0x7f74ef1b8338: 0, timeoutUs=100000
Locals at 0x7f74ef1b8310, Previous frame's sp is 0x7f74ef1b8320
Saved registers:
rbx at 0x7f74ef1b8308, rbp at 0x7f74ef1b8310, rip at 0x7f74ef1b8318
(gdb) info locals
buff0 = 0x7f74f42e0ab0
returnedElems = 4096
(gdb) print _currentBuff
$2 = (short *) 0x7f74f44ccac0
The siginfo si_addr is equal to _currentBuff, i.e. _currentBuff is set to an invalid address and SoapySDRPlay segfaults when we attempt to copy from that address into buff0.
Attempt 2, I stared at this for a while and the buffer increment stuff in readstream looks fine. I'm a little suspicious of the buffer accounting in rx_callback:
1) This code releases a buffer to the reader thread (aka readStream)
if ((_buffs[_buf_tail].size() + spaceReqd) >= (bufferLength / decM))
{
2) The next section starts to fill the next tail buffer, which using a resize. Resize could cause an allocation, which is something i'm a little weary about.
// get current fill buffer
auto &buff = _buffs[_buf_tail];
buff.resize(buff.size() + spaceReqd);
But I'm not sure the _buf_count
is safe. We check that we have an available write buffer before 1) with if (_buf_count == numBuffers)
but for section 2) this may not be the case if we just incremented _buf_count
.
So now the current tail buffer could be a buffer already given to the read thread, which in turn will now get resized and re-allocated larger than we expected to a new pointer because its size didnt get cleared because it was not yet released by the read thread. -- just a guess, but it seems like a buffer accounting issue.
I just got a segfault with trunk-recorder running 2 recorder threads after about 30 hours of continuous operation. Normally, with 3 recorder threads, I get a segfault after somewhere between 10 seconds and 1 hour depending on how busy the receive threads are. The segfault always seems to happen when all available recorders are recording. My guess right now would be that the segfault is correlated with peak CPU usage, which makes sense if it is a race condition between the read and write threads.
I can reproduce this issue.
SoapySDRPlay: ccc0746d122b91c520ff377b5978d9a45e77feff trunk-recorder: fd9553a6716a2a86a6ef2eef2b82433aa831f043 SDRPlay RSP API: 2.13.1
All other software is from Debian 10.4 x86-64.
So far it seems that if I set "digitalRecorders": 1
I do not see a crash, but with any higher values I'll get a segmentation fault within a couple minutes. I've tested with 6 and 10 MHz sample rates.
For the core file, a gdb backtrace shows
(gdb) bt
#0 __memmove_sse2_unaligned_erms () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:262
#1 0x00007f50e74c7ed4 in SoapySDRPlay::readStream(SoapySDR::Stream*, void* const*, unsigned long, int&, long long&, long) () from /usr/local/lib/SoapySDR/modules0.6/libsdrPlaySupport.so
#2 0x00007f50ef159898 in ?? () from /lib/x86_64-linux-gnu/libgnuradio-osmosdr.so.0.1.4
#3 0x00007f50f03286f7 in gr::sync_block::general_work(int, std::vector<int, std::allocator<int> >&, std::vector<void const*, std::allocator<void const*> >&, std::vector<void*, std::allocator<void*> >&) ()
from /lib/x86_64-linux-gnu/libgnuradio-runtime.so.3.7.13
#4 0x00007f50f02ee0a0 in gr::block_executor::run_one_iteration() () from /lib/x86_64-linux-gnu/libgnuradio-runtime.so.3.7.13
#5 0x00007f50f0331b29 in gr::tpb_thread_body::tpb_thread_body(boost::shared_ptr<gr::block>, boost::shared_ptr<boost::barrier>, int) () from /lib/x86_64-linux-gnu/libgnuradio-runtime.so.3.7.13
#6 0x00007f50f0323ec4 in ?? () from /lib/x86_64-linux-gnu/libgnuradio-runtime.so.3.7.13
#7 0x00007f50f02d9342 in ?? () from /lib/x86_64-linux-gnu/libgnuradio-runtime.so.3.7.13
#8 0x00007f50f0793615 in ?? () from /lib/x86_64-linux-gnu/libboost_thread.so.1.67.0
#9 0x00007f50f0765fa3 in start_thread (arg=<optimized out>) at pthread_create.c:486
#10 0x00007f50eebdf4cf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
I'm getting a similar segfault using linhpsdr (on Arch): https://github.com/g0orx/linhpsdr/pull/84
I also have this issue on Windows (x86, MSVC). I get a crash in both GQrx and Quisk using SoapSDR. Here is the calll-stack when using SDRPlay in Quisk:
VCRUNTIME140!memcpy(unsigned char * dst = 0x1ae9a170 "???", unsigned char * src = 0x6efc40c4 "@`???",
unsigned long count = 0x80e8)+0x4e
SdrPlaySupport!SoapySDRPlay::readStream(class SoapySDR::Stream * stream = 0x1ae9a170,
void ** buffs = 0x6efc40c4, unsigned int numElems = 0x80e8, int * flags = 0x2294f2fc, int64 * timeNs = 0x2294f300,
long timeoutUs = 0n30000)+0xf5
SoapySDR!SoapySDRDevice_readStream(struct SoapySDRDevice * device = 0x1ae9a170, struct SoapySDRStream * stream = 0x1ae9a170,
void ** buffs = 0x6efc40c4, unsigned int numElems = 0x80e8, int * flags = 0x2294f2fc, int64 * timeNs = 0x2294f300,
long timeoutUs = 0n30000)+0x5d
soapy+0x265a
python36!_PyCFunction_FastCallKeywords(struct _object * func = 0x03f327b0, struct _object ** stack = 0x04d794a0, int nargs = 0n150000, struct _object * kwnames = 0x00000000)+0x3e
...
Very annoying since I have to unplug the SDRPlay from the USB-port and restart the SDRPlayAPI service to make the radio work again in e.g. SDRUno.
Edit:
Oh, and a !analyze -v
in WinDbg says APPLICATION_FAULT_INVALID_POINTER_READ
for the std::memcpy()
in SoapySDRPlay/Streamin.cpp
(at line 310).
For what it's worth, I updated to https://github.com/pothosware/SoapySDRPlay3/ and this issue appears to be fixed. I also pulled and rebuilt trunk-recorder, gr-osmosdr, and all of the soapy components.
I'm getting a segfault from trunk-recorder:
I poked around the backtrace and it appears to point to SoapySDRPlay:
I have the 600 megabyte core dump file if anyone would like to examine it in more detail. (Or let me know what else you would like to see... happy to run any gdb commands to poke at the core dump on request.)
For what it's worth, I've reproduced this issue five times so far (and I have core dumps for four of those times... they all look the same to me). I have only been able to reproduce the issue when the radio system is busy and trunk-recorder is using 3 or more recorder threads. I've run for quite a long time with 2 recorder threads with no segfaults; when I switch back to 3 recorder threads, a segfault comes along within an hour or so (whenever the radio system gets busy).
Version information:
I am not sure whether to file this bug report under SoapySDRPlay, gr-osmosdr, or trunk-recorder... based on the backtrace, I decided to start with SoapySDRPlay.
Please let me know if there's any additional testing I can do, or any additional information I can provide.