greatscottgadgets / hackrf

low cost software radio platform
https://greatscottgadgets.com/hackrf/
GNU General Public License v2.0
6.63k stars 1.54k forks source link

hackrf_transfer -c produces possible buffer underruns #1503

Open Sasszem opened 2 weeks ago

Sasszem commented 2 weeks ago

What type of issue is this?

permanent - occurring repeatedly

What issue are you facing?

While measuring the output of a clone HackRF one with a proper spectrum analyser, I noticed the power output fluctuating, and with slower sweep times, the spectrum looked exactly like an OOK spectrum. I confirmed the result using time-domain measurements with a low-frequency carrier - the output went to 0 for extended periods (milliseconds) somewhat randomly. I suspected software error, and wrote a small piece of C++ code on my own to transmit. My code did NOT produce the same effect, confirming a software bug in or triggered by hackrf_transfer. I am planning to investigate the real cause, but in the meantime, I'm opening this issue in case anyone else encountered the same effect.

What are the steps to reproduce this?

Can you provide any logs? (output, errors, etc.)

No response

martinling commented 1 week ago

The host is probably failing to keep up with supplying samples to the HackRF in time.

Check the statistics from hackrf_debug -S after running hackrf_transfer. If there is a non-zero number of shortfalls reported, this is the issue.

With the -c mode, hackrf_transfer fills its buffers with constant samples generated on the fly, but the process of supplying samples to the HackRF is otherwise the same as when reading sample data from a file.

Sasszem commented 1 week ago

Software versions

Previous firmware version reported by hackrf_info was Firmware Version: 2022.09.1 (API:1.06). Updated both library and firmware to latest version, now hackrf_info reports

libhackrf version: git-17f39433 (0.9) Firmware Version: git-17f39433 (API:1.08)

The problem persists even after the update.

I am running Pop!_OS 22.04 LTS with 6.9.3-76060903-generic kernel.

Testing script

I made a short script for testing:

#!/bin/bash
printf "Resetting device...\n"
hackrf_spiflash -R
sleep 1
hackrf_transfer -f 10000000 -a 0 -c 0 -x 0 -n 10000000
hackrf_debug -S

Typical output with this script:

M0 state:
Requested mode: 0 (IDLE) [complete]
Active mode: 0 (IDLE)
M0 count: 19759104 bytes
M4 count: 19759104 bytes
Number of shortfalls: 47
Longest shortfall: 15168 bytes
Shortfall limit: 0 bytes
Mode change threshold: 0 bytes
Next mode: 0 (IDLE)
Error: 0 (NONE)

15168 bytes = 7584 samples. With default sample rate of 10Msps it is 0.7584ms, in the ballpark observed on the scope.

Root cause debug

There are quite a few differences in hackrf_transfer and my provided script. My script was built using g++ with

CPPFLAGS = -O2 -lhackrf -lm -g -Wall -std=c++20

I checked if -O2 made any difference, but even with -O0 my code did not produce any underruns.

The main difference of the codes is the sample rate, 2M in case of mine and 10M default in hackrf_transfer.

Cross-checking revealed this to be the main problem (was not suprising), hackrf_transfer works well with 2Msps and mine fails with 10M the same way.

Fixing - no idea how

I did not find any tips on how to help the host keep up. Both codebases are just as optimized as they can be, so I think some settings in the OS or in libusb need to be changed, but I have no idea on that.

I recommend adding a warning display in the output of hackrf_transfer when buffer overruns occur.

martinling commented 1 week ago

The problem is not with the speed of your code but with how steadily the host's USB stack pushes data to the HackRF. There is limited buffer space on the device, and it runs out quickly if the flow of data from the host is interrupted for a while.

The main thing you can optimize here is to have the HackRF on its own USB bus. If there are other devices on the bus then the data flow to the HackRF will inevitably be interrupted whilst the host is servicing them.

There isn't really anything to tweak in libusb here. We're already queuing up multiple asynchronous transfers to get the best throughput. And I'm not aware of anything to tweak on the OS side, but we have had someone reporting shortfall problems on Linux even with a dedicated bus, which never used to happen, so it might be that something has changed on the kernel side that's relevant here.

I recommend adding a warning display in the output of hackrf_transfer when buffer overruns occur.

Yeah, this is a good idea. IIRC, it wasn't done when I first added the shortfall stats because older firmware wouldn't support the request. But we should probably go ahead and do it now.

I also have #1484 open, which warns in hackrf_info if other devices are sharing the bus.