Closed nuena closed 2 years ago
Just wanted to say thanks for your notes and potential fixes. Our DSP dev will be looking into this soon. We have also seen the same problems you describe when under high load. This problem was mostly handled thus far by setting the daq chain to run as a high priority task in Linux, but that is also certainly not a great solution.
Thank you for the report and the suggestion. I have included your checks and added one more warning message.
In the earlier version of the code, we have used mutexes to exclude the possibility of race condition, but it turned to that it unnecessarily slowed down the code, and it caused sometimes sample losses.
In case the circular buffer is corrupted, the delay synchronizer module will likely catch this error, as it continuously checks the sample synchrony of the channels.
We are trying to continuously verify the correct operation on Raspberry Pi, I would always recommend to run the unit tests for not verified architectures to reveal potential sample loss problems. As you highlighted the increase of the number of buffers may simply solve the problem.
Short description
If
rtl_daq.c
is not able to push data downstream fast enough, data reads and writes on the same memory sections can occur simultaneously, leading to undefined behaviour and incoherent data between the receiver channelsSteps to reproduce Unknown, as I observed this on my own fork of the software, running on a RPi 4 and with added sample export (which will have an impact on performance). However, this should always happen if the performance is not high enough for rtl_daq.c to process data in real time. While I cannot say if anybody else will encounter this issue, I still suggest implementing at least my workaround as to avoid silent undefined behaviour.
Explanation
rtl_daq.c
maintains the circular bufferrtl_rec
, which containsNUM_BUFF
(8) slots to which rtlsdrCallback can write, and from which the infinite while-loop inmain
reads. The index of the "slot" to which is written is determined by calculatingand the index of the slot from which is read is calculated by calculating
where rtl_rec->buff_ind and read_buff_ind are two counters, that count how many packets are read from the device and how many are sent downstream, respectively.
There are checks (ll. 600-605) which aim to ensure that data is not read from the same slot by enforcing
rtl_rec->buff_ind <= read_buff_ind
, however ifrtl_rec->buff_ind
is an integer multiple ofNUM_BUFF
larger thanread_buff_ind
, this check will not fail, even though the modulo function will refer to the same slot in the circular buffer. In this case,rtlsdrCallback
may write data whilemain
is reading.Proposed solution I believe there is no "clean" solution to this issue that does not cause data loss. Increasing
NUM_BUFF
(potentially dynamically to use most of the available resources?) will decrease the likelyhood of this occuring, however at some point data must be dropped if some part of the toolchain is not able to keep up.Instead, I propose to ensure that no simultaneous read and write occures and issue a warning in this case: Add a second check to the for-loop in lines 600-605:
This will wait until the buffer "slots" are unequal. It might also be preferable to "fast-forward"
read_buff_ind
(i.e. set it tortl_rec->buff_ind
?) to indicate a data loss downstream due to the jump. However, I have no clue if this might cause other issues.