Closed VioletGiraffe closed 9 months ago
I think this is probably due to the host not maintaining sufficient throughput of the USB data stream. The figures of 256KB and 16KB are significant, as these are the USB transfer sizes used on the host and device respectively.
To confirm this theory: after running your software and receiving for a while, if you run hackrf_debug -S
, is the shortfall count non-zero? If so, these glitches are due to samples being lost because the device-side buffer filled up.
40MB/s througput is difficult for some USB 2.0 hosts to maintain consistently enough to avoid filling the relatively small buffer size on the HackRF.
There is a particular issue on Windows, where the WinUSB driver does not support queuing up multiple transfers unless a specific RAW_IO
feature is enabled. Up until now libusb has not had any API to enable this, so libhackrf hasn't been able to take advantage of it.
I've actually been working on fixing this in libusb, and have had an initial PR merged, but the API needs some further work, which is being addressed in this PR. Once some version of that is merged, we should be able to update libhackrf to use that feature and avoid throughput issues like this on Windows.
In the meantime, the only options I can suggest are:
RAW_IO
feature, if that doesn't pose other problems for your software.Small edit - I have incorrectly stated that the test image width was 1280, while it really is 1024 samples in width.
And our software does not use libhackrf, we use libusb 1.0 directly, but SDR++ does use libhackrf. Our transfer size is also 256k, at lower size there are more prominent artifacts. I just copied these values from your code.
Thanks for the suggestions, we'll try the debug
tool. And thank you for the insight into libusb. And for your work in general!
If you're using libusb directly, you could probably fix this in your own software relatively easily using the current git master version of libusb, which has my initial RAW_IO
support from https://github.com/libusb/libusb/pull/1208, that was merged as of https://github.com/libusb/libusb/commit/9e4bb9cbe59050d29f7662363c0a7bfc5cf35550.
After opening the HackRF device in your code, add a call to libusb_set_option
to enable RAW_IO
for the HackRF bulk IN endpoint.
Just bear in mind that we're expecting this API to change again before the libusb 1.0.27 release, so it's not a permanent fix.
Off-topic, but maybe you could help me find compiled hackrf tools for Windows? The common suggestion (including the official docs) is PothosSDR, but the latest build is from 2021 and its hackrf_debug
doesn't support -S
.
There's a build here - just note it's not an official release from us: https://github.com/dmaltsiniotis/hackrf/releases/tag/v2023.01.1
Thanks! Indeed, just after a couple seconds of receiving 20 shortfalls were registered, the longest being 75k. Will try the new libusb feature as soon as I figure out how to build it from source. Looks like you have diagnosed the issue precisely, I think it's safe to close it. I appreciate your help and speedy replies!
And thank you for the very clear bug report! The image plot showing the glitch pattern is really neat.
Sorry, I'm back. We have tried the new libusb + LIBUSB_OPTION_WINUSB_RAW_IO
, now we're getting Number of shortfalls: 0
, but the glitches are still there. Do you have any more ideas? On the surface, it looks a lot similar to #688.
1024 samples (2048 bytes) per line.
So they are. Well that's interesting.
I'm also looking at the top plot in your initial post, and it does look very much like that first downward-sloping part of the blue trace repeats exactly, which is very reminiscent of #688. I have no current theory for how that could happen with the current code, though.
Tagging @mossmann and @miek - any theories?
Note that in the process of fixing #688, @miek wrote a debug image for the CPLD which makes it output a counter value instead of RX samples. This was used to verify his new M0-based implementation fixed those glitches, and I also used this to verify that my rewrite in #982 was also glitch-free.
It might be worth you trying that too. I've attached a build of the 2023.01.1 firmware with SGPIO_DEBUG
enabled. If you try running your software with this firmware, you should get a clean sawtooth pattern instead of normal RX.
Each next value should be 1 greater than the last?
Yes, exactly. But it doesn't care about I and Q, so if you're looking at it component-wise you'll see (I=1, Q=2), (I=3, Q=4), etc.
I see, thank you! Will try it shortly.
1024 IQ samples per line, I and Q are consecutive pixels, so 2048 bytes per line. This is data directly from the device as is (except for the little red overlay at the top, don't mind that).
Another run:
Thanks! I've still no idea why this is happening, but these images probably contain some clues.
@VioletGiraffe can you show is the output of hackrf_info
with all three of your units connected?
Not with 3 because they're in different countries :) Will 2 suffice?
hackrf_info version: 2023.01.1
libhackrf version: 2023.01.1 (0.8)
Found HackRF
Index: 0
Serial number: 0000000000000000f75461dc28898dc3
Board ID Number: 4 (HackRF One)
Firmware Version: git-b42a1855 (API:1.07)
Part ID Number: 0xa000cb3c 0x007d476e
Hardware Revision: r9
Hardware does not appear to have been manufactured by Great Scott Gadgets.
Hardware supported by installed firmware:
HackRF One
Found HackRF
Index: 1
Serial number: 0000000000000000644064dc267930cd
Board ID Number: 2 (HackRF One)
Firmware Version: 2023.01.1 (API:1.07)
Part ID Number: 0xa000cb3c 0x00474764
Hardware Revision: older than r6
Hardware supported by installed firmware:
HackRF One
Just flashed the second one (index 1 in the info above), same thing:
So it's happening on both r9 and OG hardware. :thinking:
Have you reproduced this with hackrf_transfer
?
Yep, same. I have done it with the 3rd hackrf:
hackrf_info version: 2023.01.1
libhackrf version: 2023.01.1 (0.8)
Found HackRF
Index: 0
Serial number: 0000000000000000118461dc21624503
Board ID Number: 4 (HackRF One)
Firmware Version: git-b42a1855 (API:1.07)
Part ID Number: 0xa000cb3c 0x0054435b
Hardware Revision: r9
Hardware does not appear to have been manufactured by Great Scott Gadgets.
Hardware supported by installed firmware:
HackRF One
Parameters used: .\hackrf_transfer.exe -r 1.dat -f 1080000000 -a 0 -s 20000000
There was one shortfall logged during this run: Number of shortfalls: 1 Longest shortfall: 24544 bytes
That's only part of the data, here is what the complete run looks like (same image, but resized to fit into the window). I assume the shift is due to that one shortfall, and note that the dots have not shifted. Each line is 2048 bytes.
Have you reproduced the problem on any OS other than Windows?
No, we don't have any Linux machines at the moment, and not really keen on installing Linux onto any of our computers either. I wonder if a live CD image will work. Have you tried checking it yourself? So far we have 3 out of 3 units demonstrating the issue. The debug firmware makes the issue really obvious. Although our units "do not appear to have been manufactured by Great Scott Gadgets" - I hope that's not why they're glitching.
We've just been able to reproduce it on Linux.
Hi guys, no ideas / workarounds yet?
Hi @VioletGiraffe,
I have some ideas, but no conclusions or solutions as yet.
My best guess as to how this happens is that for some reason, the rx_loop
in sgpio_m0.s sometimes takes more cycles than we expect it to.
At 20Msps, we have a hard deadline of 163 cycles between SGPIO interrupts. If we miss that deadline, we get out of sync with reading data from the SGPIO peripheral. New samples can be overwritten by the hardware before, or whilst, the data is being read. If the timing were disrupted enough, it would be possible for the loop to read one 32-byte chunk late, immediately see the interrupt flag set because the data already changed, and proceed to read the same 32-byte chunk again, thus repeating 16 IQ samples in the buffer twice. It would also be possible for the multiple read operations used to read out the data registers to straddle the data changing, such that some bytes are from one 32-byte chunk and some are from the next. It seems plausible that this could result in the kinds of glitch patterns seen.
What I don't know is how this happens, or if it is indeed what's happening, how to stop it. The current rx_loop
has a calculated worst-case timing of 152 cycles, which should give us several cycles of timing margin. All the calculations for this figure are shown in the source. The M0 core has very simple timing properties with fixed cycle counts for each instruction, and specific additional latencies if a branch is taken. There are further latencies due to the asynchronous bridge that connects the SGPIO peripheral to the AHB bus matrix. We account for all these in the most pessimistic way possible.
We don't have a way to verify the actual timings, though. After this issue was reported, I looked into using OrbTrace to get tracing data via the SWO pin, but on the LPC43xx this is only available on the M4 core, not the M0, and even then only up to 120MHz clock rate. We could try to add instrumentation within the code, e.g. by having it read a hardware counter to measure its own execution time, or toggle a GPIO pin and watch it on an oscilloscope, but to do so would require adding more cycles to the loop so we would no longer be measuring the same thing.
My suspicion at this point is that there could be a delay caused by AHB bus contention.
The 32KB SRAM area used for the sample buffer is specifically selected to straddle a boundary between RAM blocks with separate AHB connections, so that the M0 core and the USB peripheral can each independently access one 16KB half of the buffer, without bus contention. The M4 waits until the M0 has filled one 16KB half-buffer, and tells the USB peripheral to send it out to the host. That transfer should complete before the M0 needs to write the other half. Thus the two should never be accessing the same memory block.
However, it may be that some bus usage by another peripheral, or even instruction fetches by the cores themselves, causes just enough contention to delay the loop and cause a glitch.
One thing that would be useful to do is to bisect firmware versions to see when this started happening. I am pretty sure that it did not happen when #982 was first merged, because I did a lot of testing with SGPIO_DEBUG
at that point. The confusing thing has been that the M0 code has not changed since that point. However, some other seemingly innocuous change to the M4 code could perhaps have caused a delay via bus contention.
Appreciate the detailed reply. I also thought of instrumenting the code via a GPIO that could be monitored with a logic analyzer, but you rightly point out that it would make the loop even longer. Two small ideas:
SGPIO_DEBUG
that something is delaying the loop, and the glitch patterns show us when. The question is what causes it.mov shortfall_length, hi_zero
in update_counts
, and that only saves one cycle.Ah! I have just realised that there may be a very obvious culprit.
In #1203, I reduced the USB transfer size from 16KB to 8KB. This improved USB throughput to the host a great deal. But that change allows the USB peripheral to start reading the first 8KB of a 16KB half-buffer before the M0 has finished writing the next 8KB of it. So the two are both accessing the same 16KB RAM block, using the same AHB connection.
I bet that is the cause of the glitch. Let me put together a modified firmware image to test.
Attached are two firmware images. These are built from the 2023.01.1 release but with #1203 reverted. One is built with SGPIO_DEBUG
, the other for normal RX.
Indeed, this seems to fix it for me:
2023.01.1 with SGPIO_DEBUG
:
2023.01.1 with SGPIO_DEBUG
and #1203 reverted:
Wow, great job! Thank you very much for looking into this and finding the problem. Will the fix reduce USB throughput on Windows even with the RAW_IO feature enabled?
And do I understand correctly that there is not enough free RAM available to keep the 8 KB buffers in separate RAM blocks (i. e. waste some RAM capacity in exchange for contention avoidance)?
Will the fix reduce USB throughput on Windows even with the RAW_IO feature enabled?
If we just revert that change, yes - at least on some systems. The throughput tests I did on #1203 would have been with RAW_IO enabled, both before and after the transfer size change.
I think it may be possible to fix the glitch without losing the throughput though, see https://github.com/greatscottgadgets/hackrf/pull/1203#issuecomment-1848791434.
And do I understand correctly that there is not enough free RAM available to keep the 8 KB buffers in separate RAM blocks (i. e. waste some RAM capacity in exchange for contention avoidance)?
There's no shortage of RAM - we have lots RAM that we're not using. What we have is a shortage of independent RAM blocks with their own AHB connections.
This is what the bus matrix looks like. There are four independent RAM blocks in the LPC4320, two "local SRAM" and two "AHB SRAM" - although these names are somewhat misleading because as you can see, they're all connected via the AHB. Data can move around this diagram without any contention as long as each movement has one row and one column completely to itself at that time.
Any RAM block used for part of the sample buffer needs to be completely dedicated to that purpose - it can't have other data or code in it that the cores might be accessing at the same time, or that would cause contention. So we can't just take 8KB from each of the four SRAM blocks, or we'd have no other RAM to use for anything else.
What we've done up to now is to leave the local SRAM for general use, and put the buffer in the two AHB SRAM blocks.
Those two blocks are contiguous in the memory map, so it's possible to use a simple ring buffer that sits across the boundary between them. That keeps the address calculations in the M0 code extremely simple - the update_buf_ptr
macro only takes three cycles.
The problem with this approach is that throughput is very sensitive to USB latency. The USB peripheral can't start reading data from one half of the buffer until the M0 has finished writing all of that half, and then it has to successfully get that data out to the host before the M0 needs to come back to the first half. The change in #1203 relieved some of that time pressure by allowing the USB peripheral to start reading data earlier, but it broke the rules needed to avoid bus contention in the process.
The idea I set out in https://github.com/greatscottgadgets/hackrf/pull/1203#issuecomment-1848791434 is to write data in 8KB "stripes", like in a RAID-0 array, with the stripes alternating back and forth between the two AHB SRAM blocks. But thinking more about it this may not help, because although it means the USB peripheral can start reading data earlier, it also means it has less time to get that data out to the host before the M0 needs to start using the same SRAM block again.
What we really need is more blocks. Perhaps we can use the 72KB local SRAM block, and have the buffer spread across three blocks in a round-robin scheme.
@martinling and I talked about this some more and came up with a plan we would like to try. There is enough space in ram_local1 (which stores the M4 text section) to add a 32 KiB or 48 KiB USB buffer separate from the M0/SGPIO buffer. Instead of having USB operations read and write directly from/to the SGPIO buffer, we can have DMA shuttle data between the SGPIO buffer and this new USB buffer. DMA transfers should complete very quickly, long before the M0 is done with the other half of the SGPIO buffer. USB reads and writes would contend only with M4 instruction reads, never with the M0.
Hi, thanks for the fix, but do I understand correctly that it's a simple reversion of an earlier change, which fixes the glitches but replaces them with signal dropout problems at high sample rates?
It's the revert of the earlier change as discussed. Performance should be as per the 2022.09.1 release.
Whether you get issues at high sample rates ultimately depends on the host USB stack. On my own Windows test machine I do see some shortfalls at 20Msps, but at least they are accurately reported again now.
The next step will be to implement the scheme outlined in https://github.com/greatscottgadgets/hackrf/issues/1363#issuecomment-1850452973, but it's going to take a bit of time to get that right.
What type of issue is this?
permanent - occurring repeatedly
What issue are you facing?
The issue looks like this in our software, this is just a visualization of the raw I/Q values, 4 pixels per IQ sample:
This was confirmed with two different HackRF units, same behavior.
To verify, I have connected a third unit, bought from a different seller, to a different computer, and used SDR++ to record the signal. This is different frequency, too - 226 MHz, I just scrolled around until I found a strong carrier signal with no apparent modulation. I recorded raw data and opened it as audio in Sony Sound Forge, and here it is:
And here it is again, around 8000 samples later. The period is slightly less than 8192, hard to calculate precisely in a sound editor, I'm just using 3rd-party software to demonstrate that our own software is not the culprit.
Initially, we were developing an app that receives analog video, the glitch looks like a white dot on the left. This is at 1080 MHz carrier frequency. Each line is 1024 samples (2048 bytes), the "ruler" at the very left is 32 zeros at each 256 KB and 16 zeroes at each 16 KB block, we added them right after libusb bulk transfer returns (async transfers). Each white dot slightly off the left side is the glitch. As you can see, it is not absolutely periodic, but the pattern is there. The vertical line is a perfect straight line, so it always occurs at a perfect multiple of some magic sample count (and a multiple of 2048).
Any ideas? All our 3 units are running the current 2023 firmware.
What are the steps to reproduce this?
In order to reproduce, I think it is enough to tune to a strong clean (unmodulated) carrier signal so that you can see a more or less clean sine wave with decent amplitude. Plot the sample values against time, then the "dislocations" will be obvious. Oh, and this is all at 20 MHz, we have not tested lower sampling rates as we need the 20 MHz bandwidth.
Can you provide any logs? (output, errors, etc.)
No response