swkrueger / Thrifty

Thrifty is proof-of-concept SDR software for TDOA positioning using inexpensive SDR hardware such as the RTL-SDR.
GNU General Public License v3.0
62 stars 14 forks source link

Fastcard Segmentation fault on NixOS #10

Open liamdiprose opened 5 years ago

liamdiprose commented 5 years ago

We have ourselves a Hiesenbug :ghost:

I realize NixOS isn't "officially" supported by this project, but I'm still going to try getting it going (NixOS makes deployment to Raspberry Pi very nice).

Any ideas on where this segmentation fault might be coming from would be very appreciated. I've documented my research below.

I've compiled fastcard using CMake, using the most recent libraries available on NixOS unstable. That is fftw-3.3.8 and gnuradio-3.7.13.4 (libvolk). Ubuntu 16 has fftw-3.3.4 and libvolk-1.2.1. I've also tried using libvolk-1.2.1 with no luck.

Clue # 1 is the segfault occurs when calling a function from libvolk:

https://github.com/swkrueger/Thrifty/blob/2ad9775753a8712a61c81cc78fb0bc75a921d50b/fastcard/fastcard.c#L180

Clue # 2 is that fastcard does not segfault when the block size is set to less than 4096

$ ./fastcard -i rtlsdr -b 4095 -h 4000
# works..

$ ./fastcard -i rtlsdr -b 4096 -h 4000
# ...snip...
# Segmentation Fault

Clue # 3 is that fastcard does not segfault inside valgrind

$ valgrind ./fastcard -i rtlsdr
# works...

My hunch is that it has to do with the two newer library versions. Its even possible that this bug is not occurring in this projects code. But if something comes to mind, it would be great to see this project running on Nix :heart_decoration:

swkrueger commented 5 years ago

Thank you for the bug report.

Does it work when you read data from a file instead of the RTL-SDR? For example:

$ rtl_sdr -g 5 -f 433.83M -s 2.4M data.bin  # capture data, hit Ctrl-C to stop
$ ./fastcard -i data.bin -b 4096 -h 4000

Did you try using gdb with a debug build? Also, I would recommend reading from a file instead of directly from the rtlsdr when using valgrind.

Does it work when you use volk_32fc_magnitude_squared_32f_u instead of volk_32fc_magnitude_squared_32f_a? It might be possible that the fftw alignment does not match the volk alignment for some reason.

liamdiprose commented 5 years ago

Thanks for your reply,

Changing the volk function to volk_32fc_magnitude_squared_32f_u seems to have fixed it. Does that mean it was an alignment issue? I can't find any documentation on the difference between the two functions.

Using a data.bin file solved the issue for the specific command I used by the way. Segfaults seemed to be effected by how many files the process had open. I wish I followed your second suggestion first :laughing:

What next? Is this a special case to be added to the CMakeLists, or an issue-close and patch for by nix package? Either suits me well.

swkrueger commented 5 years ago

Yeah, it is probably a memory alignment issue. volk_32fc_magnitude_squared_32f_a assumes that the memory is being allocated with volk_malloc, which will ensure that the memory is properly aligned for SIMD instructions (Neon in the case or ARM). volk_32fc_magnitude_squared_32f_u is for unaligned memory and would be slower and make use of the generic algorithm without Neon instructions.

The issue is probably that the Nix package for either FFTW or libvolk is compiled without Neon support. I was in a rush when I implemented fastcard and cut corners. I assumed that fftw's alignment would be the same as libvolk's alignment, which I think is the case for the Rpi configuration, library versions and architecture I used. It could be that FFTW is using a different alignment or that the FFTW library that you are using is compiled without Neon support and thus not performing any special alignment when fftw_malloc is called. My guess is that it is FFTW. I vaguely remember something about the official Raspbian package for FFTW including a patch to enable Neon. If I remember correctly, you can check the contents of the wisdom file generated by fastcard to check whether fftw is using neon or not -- it should contain something like fftwf_codelet_t2bv_16_neon.

You can probably fix the bug by replacing fftwf_malloc(num_bytes) with volk_malloc(num_bytes, alignment in fastcard/fft.c. But then you'll have the same issue with an opposite configuration where FFTW is compiled with Neon support and libvolk not.

Assuming that my hunch is correct regarding Nix's FFTW not using Neon on the Rpi, you can basically choose any one of the following three solutions:

  1. Fix the Nix package for FFTW to compile it with Neon instructions on the Rpi
  2. Use volk_32fc_magnitude_squared_32f_u instead of volk_32fc_magnitude_squared_32f_a and take the performance hit of using both the volk kernel and FFTW without Neon instructions.
  3. Use volk_malloc instead of fftwf_malloc and take the performance hit of using FFTW without Neon instructions.

Oh, and the number 4096 actually makes sense. It is 4K, which is probably the size of a page. You can check the virtual memory page size using getconf PAGESIZE in a shell. What could be happening is that the volk operation is going out of bounds into the next page when it starts from a misaligned address. This would result in a segfault if the next page isn't allocated. There could be cases where more memory is allocated next to that page, e.g. potentially when you read from a file and the file is mapped into the virtual address space, in which case it will not result in a segfault (but probably lead to incorrect results and unexpected behaviour).