activateStream and deactivateStream slow (up to 10ms)

gasparka commented 6 years ago

I would expect both operations to take less than 1 ms, in reality activateStream takes ~10ms and deactivateStream ~3ms.

Here is simple reproducer:

import time
import SoapySDR
import numpy as np
from SoapySDR import *  # SOAPY_SDR_ constants

class IQReader:
    def __init__(self):
        self.fs = 10e6
        self.bandwidth = 10e6
        args = dict(driver='lime')
        self.sdr_device = SoapySDR.Device(args)

        self.sdr_device.setAntenna(SOAPY_SDR_RX, 0, 'LNAH')
        self.sdr_device.setFrequency(SOAPY_SDR_RX, 0, 2400e6)
        self.sdr_device.setSampleRate(SOAPY_SDR_RX, 0, self.fs)
        self.sdr_device.setBandwidth(SOAPY_SDR_RX, 0, self.bandwidth)
        self.rx_stream = self.sdr_device.setupStream(SOAPY_SDR_RX, SOAPY_SDR_CF32)

    def sample(self):
        chunk_size = self.sdr_device.getStreamMTU(self.rx_stream)
        n_samples = int(10e-3 / (1 / self.fs))  # take 10ms of samples
        chunks_rx = int(n_samples / chunk_size)
        rx_buff = np.empty(shape=(chunks_rx, chunk_size), dtype=np.complex64)

        ss = time.time()
        self.sdr_device.activateStream(self.rx_stream)
        ee = time.time()
        print(f'activateStream  : {(ee-ss) / 1e-3} ms')

        ss = time.time()
        for i in range(chunks_rx):
            while True:
                sr = self.sdr_device.readStream(self.rx_stream, [rx_buff[i]], chunk_size)
                if sr.ret == chunk_size:
                    break
                elif sr.ret == -2:
                    raise Exception('Soapy Error -2, this can be recoverable')
        ee = time.time()
        print(f'readStream      : {(ee-ss) / 1e-3} ms')

        ss = time.time()
        self.sdr_device.deactivateStream(self.rx_stream)
        ee = time.time()
        print(f'deactivateStream: {(ee-ss) / 1e-3} ms')

reader = IQReader()
for i in range(16):
    reader.sample()

Sample output, getting 10ms of samples takes ~20 ms:

activateStream  : 8.391618728637695 ms
readStream      : 9.561300277709961 ms
deactivateStream: 3.013134002685547 ms
activateStream  : 8.059263229370117 ms
readStream      : 9.543418884277344 ms
deactivateStream: 3.016233444213867 ms
activateStream  : 8.991479873657227 ms
readStream      : 9.444713592529297 ms
deactivateStream: 2.8502941131591797 ms

HW: LimeSDR-Mini SW: pulled today

gasparka commented 6 years ago

I tested the same script with BladeRF:

activateStream  : 0.007867813110351562 ms
readStream      : 4.483461380004883 ms
deactivateStream: 0.00476837158203125 ms

activate/deactivate is basically free. However BladeRF, unexpectedly, gives old samples for first ~5ms. I would expect activateStream to clear out the buffers.

Tagging @guruofquality as this is somewhat related to SoapySDR... We should find a way to unit-test SoapySDR interface against such issues, any plans on this?

gasparka commented 6 years ago

Occasionally the deactivateStream gets stuck and takes ~1000ms. Ran the script for 512 iterations and got 5 bad ones:

deactivateStream: 1012 ms
...
deactivateStream: 892 ms
...
deactivateStream: 778 ms
...
deactivateStream: 787 ms
...
deactivateStream: 450 ms

rjonaitis commented 6 years ago

First call to activatestream is creating threads for data transferring and configures fpga for selected data formats. So it is going to take some time to create threads. Last call to deactivatestream is stopping threads and data transfer. The 1000ms looks like waiting for usb transfer to timeout.

gasparka commented 6 years ago

Hi @rjonaitis, thanks for looking into it. Are you planning for a fix? I think this is a pretty common use case if you need to sample occasionally. Keeping the stream up is not really working because then you get old samples.

One potential workaround:

resetStream() # clear buffers, sadly Soapy has no such function :(
readStream() # read new samples, without closing stream

guruofquality commented 6 years ago

I remember there was one useful change I wanted to make (and maybe more)

I often use activateStream(end burst) and readStream() to get a burst of data. On LimeSuite I have to deactivateStream() afterwards, before repeating this sequence or readStream() will have the old data in the fifo.

The reality is the hardware is going to just continually stream samples from usb. Theres no hardware request queue for bursts. So I was thinking a bit that maybe the next call to activateStream() should just clear the fifo, so the next readStream() has only good stuff in it.

And maybe deactivate should be faster, its either just there to clear the old retested commands or switch something on. The intention was to keep those calls lightweight. Things just got refactored, and for the better over time usually. So the threads stuff which is costly can stay in setup/close stream. Potentially..

bladerf is actually very similar in this respect, at least in terms of the hardware implementation. And although its deactivatestream is a bit more like the stuff described above, it may actually have the issue of old readStreams() for the same reason described above. (and in a coincidental fashion I may have been seeing exactly that last week or so at my work). So anyway, activateStream() flusing for the blade may be a good idea too.

resetStream() # clear buffers, sadly Soapy has no such function :(

I mean maybe yes, it seems like it might be useful. But in this this case you don't really want to clear the stream, you want the other calls to function by just queuing up the data that you are interested in

thoughts?

gasparka commented 6 years ago

So I was thinking a bit that maybe the next call to activateStream() should just clear the fifo, so the next readStream() has only good stuff in it.

That is exactly what i want and expect from this function. Ideally i would receive samples that just 'enter the sdr', this would require clearing the FPGA, USB, Soft buffers. In reality i could live with some dirty samples at the beginning, but this depends on the sample rate. Maybe we can start with clearing the software buffers? Also, i would expect the call to be fast i.e. i want samples 'now' not 'now + 10ms', again in reality there will be a small delay.

And maybe deactivate should be faster, its either just there to clear the old retested commands or switch something on. The intention was to keep those calls lightweight.

+1. Biggest problem for me is the occasional ~1000ms delay.

resetStream()

I dont see myself using this if we manage to fix the activate/deactivate behaviour.

IgnasJarusevicius commented 6 years ago

In LimeSuite stopping and starting a stream has never been intended to be fast. I that is a problem, the application should just keep the stream running and discard unnecessary samples.

Also, I think call to activateStream() should already

So I was thinking a bit that maybe the next call to activateStream() should just clear the fifo

I think call to activateStream() already clears software FIFO

gasparka commented 6 years ago

I that is a problem, the application should just keep the stream running and discard unnecessary samples.

This is not trivial, amount of samples to skip depends on buffer sizes, which may not be static over time. Also i think it is intuitive that these calls are fast, because they are used to get timed samples.

I think call to activateStream() already clears software FIFO

Yes, it clears all the buffers, including the FPGA. That is perfect!

Would it be possible to move most of the slow stuff to start/close stream?

IgnasJarusevicius commented 6 years ago

This is not trivial, amount of samples to skip depends on buffer sizes, which may not be static over time. Also i think it is intuitive that these calls are fast, because they are used to get timed samples.

For precise timing, an application should check timestamps.

Yes, it clears all the buffers, including the FPGA. That is perfect!

It clears all buffers if it is called when stream is stopped. But I think it should clear software FIFO if called without stopping stream.

Would it be possible to move most of the slow stuff to start/close stream?

We are not going to change the behaviour of LimeSuite itself. If deactivateStream() is meant to be fast in Soapy, then SoapyLMS could probably be changed to not actually stop (cancel transfers, stop threads, etc.) stream on deactivateStream()

gasparka commented 6 years ago

Ok keeping the stream running can work, tho not very convenient. But now i found that setFrequency and setGain are also very slow:

setFrequency : 46.1 ms
setGain : 8.5 ms (sum of setting 3 gains)
activateStream : 0.0 ms
readStream : 4.9 ms
deactivateStream: 0.0 ms
total: 60 ms to get 10ms of samples :(

Comparing to BladeRF:

setFrequency : 1.5 ms
setGain: 0.5 ms (sum of setting 3 gains)
activateStream: 0.0 ms
readStream: 9.7 ms
deactivateStream: 0.0 ms
total: 12 ms

I am trying to port an application from BladeRF into LimeSDR, which is basically jumping around and taking small slices of spectrum, i really need these calls to be reasonably fast...

gasparka commented 6 years ago

Got some improvements. First i enabled the calibration cache:

args = dict(driver='lime', cacheCalibrations='1')
self.sdr_device = SoapySDR.Device(args)

This helps by turning SPI read-modify-write commands into writes. BTW is this a bug? Should it only apply to calibration related registers?. Anyhow, after this:

setFrequency : 25 ms
setGain: 0.0 ms

Next i wrote a custom TuneVCO function (ill PR later), that caches the tune values:

setFrequency : 5 ms

So, now its kind of usable for me, could be better still...

For reference, here is how bladeRF gets 0.2 ms 'setFrequency' even from Host: http://www.nuand.com/libbladeRF-doc/v1.9.0/tuning.html

9600 commented 6 years ago

@gasparka there was talk of removing cacheCalibrations. IIRC the host-based calibration has not been maintained and possibly gives bad results. @IgnasJarusevicius / @guruofquality could advise.

gasparka commented 6 years ago

Indeed, calibration seems to be quite off. Otherwise Lime seems functional, but i have not tested this too deeply. Maybe there is a common ground to get reasonable calibration + performance.

Some registers, like gains, don't need the Read-Modify-Write behavior.

rjonaitis commented 6 years ago

CalibrationsCache has been removed some time ago 94d1b935455, so it appears that cacheCalibrations no longer has to do anything with calibrations, all it does now is just enables local copy of registers to reduce registers readback from the chip.

IIRC the host-based calibration has not been maintained and possibly gives bad results Indeed, calibration seems to be quite off

Host based DC/IQ and filter calibrations also has been removed 27982818b88, and now done using the internal MCU. Note that when changing frequency using Soapy it does not trigger DC/IQ recalibration. Recalibration would add ~100ms

setFrequency : 25 ms Next i wrote a custom TuneVCO function (ill PR later), that caches the tune values:

Changing frequency requires retuning VCO, so 25ms seems normal from PC host. If your goal is to use the same set of frequencies then tuning values can be reused. Don't know about your cache lifecycle, but note that values can differ between boards and if some specific internal controls are modified they could invalidate cache.

gasparka commented 6 years ago

@rjonaitis Are there potential problems when using cacheCalibrations i.e. the copy of local registers?

Don't know about your cache lifecycle, but note that values can differ between boards and if some specific internal controls are modified they could invalidate cache.

My code loads value from the cache and tests for lock, full tune is performed on failure.

rjonaitis commented 6 years ago

@gasparka

Are there potential problems when using cacheCalibrations i.e. the copy of local registers?

No, there should be no problems. You just need to make sure to synchronize the registers copy and the chip at the beginning. Either by writting all registers to chip, or reading all registers from chip.

gasparka commented 6 years ago

Using the PRs and local registers:

setFrequency : 1.4 ms
setGain : 0.0 ms
activateStream : 0.0 ms
readStream : 9.9 ms
total: 11 ms

Pretty good, but to avoid old samples i have to throw away first ~6ms (at 40M sample rate). BladeRF shares this issue.

API is still unusable, for my purpose, if not using the local registers:

setFrequency : 10.1 ms
setGain : 7.9 ms
activateStream : 0.0 ms
readStream : 9.9 ms
total: 28 ms

As @rjonaitis mentioned, there should be no downsides of using local registers. Maybe we could make this the default so stuff would work fast out of the box?

gasparka commented 6 years ago

Closing stale thread.

myriadrf / LimeSuite

activateStream and deactivateStream slow (up to 10ms) #200