Improved Detection using dft_detect

darksidelemm commented 5 years ago

Raising an issue to keep track of testing of this.

I've switched the testing branch to using dft_detect. It definitely uses more CPU, to the point where it appears to be running slower than realtime on a Pi 2. If you run auto_rx with the -v (verbose) option, there's a debug line which shows how long the detection command ran for. Ideally (in the case of no sonde present) it should only run for whatever the detect_dwell_time configuration option is set to.

On my Pi 2, the detection command takes ~12 seconds - less than 50% realtime! (On my Macbook, it runs in 5.1 seconds).

Due to the way the decode chain works (passing samples along via stdout) running slower than realtime means that samples are buffered up at the input to dft_detect. We kill rtl_fm after seconds, and it takes dft_detect another 7 seconds to chew through the remaining samples. The end result is that scan times increase a bit on slower platforms.

However - dft_detect has significant advantages, in that it has better detection performance (up to 4 dB in some cases), doesn't have issues with mis-detection of RS41s and M10s, and can detect more sonde types. Good job to @rs1729 on this, it's a really good bit of code!

I think we can live with the slower scan times, though it would be good to get some further information on detect speed on other platforms (Pi Zero W, Pi 3, etc...).

rs1729 commented 5 years ago

I thought so that it will run slow. In the first run, I just put all header detections from the dft-decoders together. For auto_rx the sonde types, that are not to be decoded, could be switched off/taken out. (Although only C34/C50 is not so interesting right know. iMet-1-AB maybe also not that important anymore. But LMS6 (403MHz) could be easily included in auto_rx I guess.) And I don't know if it is what are asking for, but there is the -t \ option, e.g. -t 10, such that it stops after 10 seconds. The sample rate shouldn't be less then 48k for good detection results, however the higher the sampling rate, the more it takes to process the correlation.

And then the complex-FFT could be rewritten with real,imag float, because I think the complex.h is rather slow, though I don't know why it should be so much slower. Ok, I have another place in mind, that can speed up things. I did some FFT spectrum view for ncurses (just to have a simple control view), when reading for FFT to process in blocks, it is much better to read the incoming samples in fread-blocks. These are smaller changes, I will try to test this soon. (On the other handn in real-time the fm-samples are not coming so fast, if they come one by one. but csdr/sox output to stdout should have some buffering?)

rs1729 commented 5 years ago

Larger block size didn't change much. However choosing better FFT size for correlation does speed up detection. If buffer size is no issue, it can be increased even more. And if you don't need C34/C50, compile with -DNOC34C50.

rs1729 commented 5 years ago

dft_detect: In the last update, I just encountered an interesting problem with floating point precission (in my case on 32bit cpu). I have this comment line there, but as a comment it does not help, when you are on a machine where this happens... Of course, it would be better to first calculate LOG2N, then N. I mean, it is a really interesting bug, if one does not consider different floating point implementations. And if it is done in a way where this can happen.

darksidelemm commented 5 years ago

Interesting bug, for sure!

I've now updated the testing branch to the latest version, and I'm seeing an almost 3x speedup, with no change in detection performance. Great job!

darksidelemm commented 5 years ago

An analysis of the performance improvements gained by switching to dft_detect is available here: https://github.com/projecthorus/radiosonde_auto_rx/blob/testing/auto_rx/test/notes/2019-03-01_detector_change.md

Given they are significant, I think we'll release this into the master branch fairly soon.

rs1729 commented 5 years ago

With lower threshold parameters false detection increases. In particular M10 vs IMET1AB. IMET1AB has even wider bandwidth, so when using the same bandwidth for detection, it may be only a minor issue. Further for M10 one could look for the noise (standard variation) before the header. A 16-20kHz filter is more suitable for M10 than RS41/DFM, so perhaps the M10 threshold does not need to be lower. If you allow more bit-errors for DFM, the polarity could be detected wrong. RS92 can have more bit-errors, no false detections.

For the Manchester code frequency offset actually is not such a problem, you only need to compare the 2 symbols relative to each other. However, header detection is the problem (and filter width). With correlation/FFT the dc-offset is calculate in X[0]. But if the window is too big, it can extend from the header back to the noise/offset of M10. For Manchester codes a shorter running average could also be used. As far as I tested, sometimes it helps, sometimes it makes detection worse... If the offset is getting larger, a narrow filter is becoming a problem. On the other hand, I think a narrow filter for FM-demodulation before detection is making things much easier (as long as (external) FM-demodulation is used). So maybe bandwidth estimate from frequency/peak scan is something to consider. Or some pre-identification/exclusion when processing the band-fft, before dft_detect.

Anyway, the more suitable correlation windows are also useful for the decoders.

rs1729 commented 5 years ago

baud = symbols per second (symbols=="raw bits") RS92: 4800 RS41: 4800 DFM: 2500 M10: 9616 LMS6: 4800

RS92-SGP, DFM, M10 have 2 symbols per bit ecc-rate: RS92-SGP/RS41: 231/255 Reed-Solomon (not considering shortend RS41-standard frames) DFM: 1/2 Hamming LMS6: 1/2 convolutional code, 223/255 Reed-Solomon

The convolutional code helps a lot more than Manchester code, but only after demodulation, whereas the latter can assist demodulation. (It might be possible to predict convoluted bits in the demodulator?) And the symbol per second is the "speed", although RS41 and M10 frames are only a fraction of a second long.

darksidelemm commented 5 years ago

Resolved by https://github.com/projecthorus/radiosonde_auto_rx/pull/138

projecthorus / radiosonde_auto_rx

Improved Detection using dft_detect #122