muccc / gr-iridium

Iridium burst detector and demodulator.
372 stars 81 forks source link

Segfault during live capture under OS X #8

Closed JosephRedfern closed 3 years ago

JosephRedfern commented 8 years ago

I realise OS X may not be official supported (?), and I will try this under a Linux machine tomorrow to see if I encounter the same issue. That said, I'm not sure if this but may affect other platforms.

I'm running with a HackRF, and have tried various variations of the command iridium-extractor -D 4 Examples/hackrf.conf | grep "A:OK" >> output.bits for capture. It doesn't seen to error out after X seconds... sometimes it's after one second, other times a few minutes.

The full stack-trace is available here: https://gist.github.com/JosephRedfern/0e8f8482aba7305d64d9b540c6428542

I believe the relevant section is this:

Thread 5 Crashed:
0   libgnuradio-iridium.dylib       0x000000010e9755fd gr::iridium::burst_downmix_impl::process_next_frame(float, float, unsigned long long, unsigned long long, unsigned long, int) + 909 (burst_downmix_impl.cc:396)
1   libgnuradio-iridium.dylib       0x000000010e974b07 gr::iridium::burst_downmix_impl::handler(boost::intrusive_ptr<pmt::pmt_base>) + 5079 (burst_downmix_impl.cc:707)
2   libgnuradio-iridium.dylib       0x000000010e97c35d boost::_mfi::mf1<void, gr::iridium::burst_downmix_impl, boost::intrusive_ptr<pmt::pmt_base> >::operator()(gr::iridium::burst_downmix_impl*, boost::intrusive_ptr<pmt::pmt_base>) const + 141 (mem_fn_template.hpp:165)
3   libgnuradio-iridium.dylib       0x000000010e97c29e void boost::_bi::list2<boost::_bi::value<gr::iridium::burst_downmix_impl*>, boost::arg<1> >::operator()<boost::_mfi::mf1<void, gr::iridium::burst_downmix_impl, boost::intrusive_ptr<pmt::pmt_base> >, boost::_bi::list1<boost::intrusive_ptr<pmt::pmt_base> const&> >(boost::_bi::type<void>, boost::_mfi::mf1<void, gr::iridium::burst_downmix_impl, boost::intrusive_ptr<pmt::pmt_base> >&, boost::_bi::list1<boost::intrusive_ptr<pmt::pmt_base> const&>&, int) + 126 (bind.hpp:313)
4   libgnuradio-iridium.dylib       0x000000010e97c20d void boost::_bi::bind_t<void, boost::_mfi::mf1<void, gr::iridium::burst_downmix_impl, boost::intrusive_ptr<pmt::pmt_base> >, boost::_bi::list2<boost::_bi::value<gr::iridium::burst_downmix_impl*>, boost::arg<1> > >::operator()<boost::intrusive_ptr<pmt::pmt_base> >(boost::intrusive_ptr<pmt::pmt_base>&&) + 77 (bind.hpp:905)
5   libgnuradio-iridium.dylib       0x000000010e97c040 boost::detail::function::void_function_obj_invoker1<boost::_bi::bind_t<void, boost::_mfi::mf1<void, gr::iridium::burst_downmix_impl, boost::intrusive_ptr<pmt::pmt_base> >, boost::_bi::list2<boost::_bi::value<gr::iridium::burst_downmix_impl*>, boost::arg<1> > >, void, boost::intrusive_ptr<pmt::pmt_base> >::invoke(boost::detail::function::function_buffer&, boost::intrusive_ptr<pmt::pmt_base>) + 48 (function_template.hpp:160)
6   libgnuradio-iridium.dylib       0x000000010e95ee5f boost::function1<void, boost::intrusive_ptr<pmt::pmt_base> >::operator()(boost::intrusive_ptr<pmt::pmt_base>) const + 175 (function_template.hpp:772)
7   libgnuradio-iridium.dylib       0x000000010e95d0b5 gr::basic_block::dispatch_msg(boost::intrusive_ptr<pmt::pmt_base>, boost::intrusive_ptr<pmt::pmt_base>) + 165 (basic_block.h:134)
8   libgnuradio-runtime.3.7.8.dylib 0x000000010ea5e0ce gr::tpb_thread_body::tpb_thread_body(boost::shared_ptr<gr::block>, int) + 2930
9   libgnuradio-runtime.3.7.8.dylib 0x000000010ea532f2 gr::tpb_container::operator()() + 74
10  libgnuradio-runtime.3.7.8.dylib 0x000000010ea5311e gr::thread::thread_body_wrapper<gr::tpb_container>::operator()() + 26
11  libboost_thread-mt.dylib        0x000000010f0556c5 boost::(anonymous namespace)::thread_proxy(void*) + 53
12  libsystem_pthread.dylib         0x00007fff9bf85c13 _pthread_body + 131
13  libsystem_pthread.dylib         0x00007fff9bf85b90 _pthread_start + 168
14  libsystem_pthread.dylib         0x00007fff9bf83375 thread_start + 13

Any pointers as to debugging this problem? It's possible I've done something very obviously wrong, so please don't rule that out!

schneider42 commented 8 years ago

Thanks for the stack trace. I'm not sure why you are running into these issues. Maybe there is a bug in the code which does not surface under Linux.

Can you turn on debugging in burst_downmix_impl.cc on line 93, by putting a true in there (sorry, but the file does not respect the debug flag yet).

It will print a max_index= just before crashing. This should give an indication about what is going on.

JosephRedfern commented 8 years ago

@schneider42 Thanks. I've just re-compiled with that flag set, but won't have access to the HackRF until Monday. I'll try again then and let you know the output.

Oh, and I can rule out a HackRF hardware issue, as everything is working as expected under my Linux machine.

JosephRedfern commented 8 years ago

@schneider42 Got a chance to try running with the changed flag. The output can be seen here:

https://gist.github.com/JosephRedfern/255a60431e33c21b888745acb0908910

schneider42 commented 8 years ago

The debug output does not contain output from the printf mentioned above. Did the program stop running right after outputting the first status line?

I also think you should try to capture a file using hackrf-transfer and then process that offline, to get reproducible results.

Command to capture a file: hackrf_transfer -r testfile.sc8 -f 1621800000 -a 1 -l 40 -g 20 -s 12000000

Command to process the file: iridium-extractor -c 1621800000 -r 12000000 -f hackrf --offline testfile.sc8 > output.bits

Try this a few times until you can (hopefully) crash the program. If you share the file I will also have a look.

adecarolis commented 8 years ago

I have the same problem and have performed the capture you requested. Here are the stacktrace and the capture.

schneider42 commented 8 years ago

Thanks. These files do not make my version segfault on Linux. We've spent a few hours on getting the thing to run at all under OS X. I hope to be able to reproduce this in the coming days.

gyaresu commented 8 years ago

FYI: Getting the same on a 'new' install for macOS Sierra on macports.

gyaresu on zaphod in ~/programming/gr-iridium/build(8d23h58m|master*) λ iridium-extractor -c 1621800000 -r 12000000 -f hackrf --offline /tmp/testfile.sc8 > /tmp/outfile.bits 1474584431 | i: 0/s | i_avg: 0/s | q: 0 | q_max: 0 | o: 0/s | ok: 0% | ok: 0/s | ok_avg: 0% | ok: 0 | ok_avg: 0/s | d: 0 [1] 55834 segmentation fault iridium-extractor -c 1621800000 -r 12000000 -f hackrf --offline >

kgarrels commented 7 years ago

This is happening in burst_downmix_impl.cc around line 401:

if max_index is 0, it will crash because you access the element max_index-1.

I added "+1", no more crashes. I hope you find a better fix ;-)

  int max_index = x - d_magnitude_f+1;
  if(k_debug) {
    printf("max_index=%d\n", max_index);
  }
  // Interpolate the result of the FFT to get a finer resolution.
  // see http://www.dsprelated.com/dspbooks/sasp/Quadratic_Interpolation_Spectral_Peaks.html
  // TODO: The window should be Gaussian and the output should be put on a log scale
  float alpha = d_magnitude_f[(max_index - 1) % (d_cfo_est_fft_size * d_fft_over_size_facor)];
  float beta = d_magnitude_f[max_index];
  float gamma = d_magnitude_f[(max_index + 1) % (d_cfo_est_fft_size * d_fft_over_size_facor)];
schneider42 commented 7 years ago

I think this is why I've added a % operation before using (max_index - 1). The intention is to wrap around if the index is out of bounds (the result of the FFT is also wrapping around at this point).

Turns out: -1 % x with x > 1 is actually -1 and not x - 1, as I obviously simply assumed. According to http://stackoverflow.com/questions/7594508/modulo-operator-with-negative-values this is how it is defined in C++11, and has been implemented like that for a long time already.

I'll change the operation so that it actually does what it is supposed to do...

Thanks a lot for spotting this!

schneider42 commented 7 years ago

https://github.com/muccc/gr-iridium/commit/c6aff8ae9ec482b55deb3101fab9b807345652ec and https://github.com/muccc/gr-iridium/commit/6695ddd9ee292e348b43ba53f7afe6e02cb0e069 should fix this.

The way the burst down mixing works (with the first FFT roughly centring the signal), max_index becomes 0 quite often. Apparently this does not surface under Linux...

With the new code, I can also decode one more frame inside my test capture :)

@kgarrels: Please give it a try

kgarrels commented 7 years ago

changes look good for me.