kgoba / ft8_lib

FT8 library
MIT License
222 stars 72 forks source link

Is possible to use 4096 FFT #23

Closed howard0su closed 2 years ago

howard0su commented 2 years ago

On small chip, 4096 points FFT is way faster than non-2-power FFT. I am wondering if we can change to use that? It will help port this lib to more platforms.

kgoba commented 2 years ago

The provided decoder shell (decode_ft8.c) is only intended as demonstration on desktop machines (or Raspberry Pi). You are free to use the library with a different configuration. Also note that kiss_fft library is only included for the demonstration. On STM32 I would suggest using the FFT routines coming with CMSIS DSP.

Take a look at the initialization of monitor_t, and you can work out the sampling rate that would give you a nice FFT size:

    float symbol_period = (cfg->protocol == PROTO_FT4) ? FT4_SYMBOL_PERIOD : FT8_SYMBOL_PERIOD;
    // Compute DSP parameters that depend on the sample rate
    me->block_size = (int)(cfg->sample_rate * symbol_period); // samples corresponding to one FSK symbol
    me->subblock_size = me->block_size / cfg->time_osr;
    me->nfft = me->block_size * cfg->freq_osr;

For FT8, 6400 Hz waveform sampling rate and FFT size of 2048 would give you perfect 3.125 (6.25/2) Hz bins and allow you 2x frequency OSR and 2x time OSR. This is how I used the library on a STM32 Cortex board. You have to do some resampling to get to 6400 Hz sampling rate from standard sampling rates, for example 8000 Hz * 4/5 = 6400, but that is also possible with CMSIS DSP routines (you have to be skilled with interpolation, decimation and FIR lowpass filters though).

howard0su commented 2 years ago

You are right. It works well. https://github.com/howard0su/ft8_lib/commit/214ea4c629b822ee043623e7e2dfb4422b78655a

thank you for your suggestions. I am experimenting if I can convert decode code to use q15 instead of float.

kgoba commented 2 years ago

I haven't tried it, but I think you could have good results with q15 as well, and it looks like you would pretty much only need to update the LDPC decoder.

howard0su commented 2 years ago

do you want me submit PR of the changes I made? I am still in progress to understand whole logic inside LDPC.

howard0su commented 2 years ago

I got it right now.

howard0su commented 2 years ago

@kgoba I converted the logic to calculate mag to q31 (q16 is hardly getting it working as log10 give too big error). However recall rate drops 20%. seems this is not a good direction.

Check my repo for changes.

howard0su commented 2 years ago

The idea to further reduce memory usage:

  1. Keep 7x768x2(freq_or) buffer for sync signal search
  2. Use sync score to determine about 200 candidates. Each candidate calculate its log174 incremental. So each candidate needs about 172 bytes.

So total memory drops to 7x768x2 + (172 + 16)* 200 = 53, 728bytes.

kholia commented 2 years ago

I like the https://github.com/howard0su/ft8_lib/commit/214ea4c629b822ee043623e7e2dfb4422b78655a change.

Without this change:

$ make; time ./decode_ft8 tests/191111_110630.wav
Sample rate 12000 Hz, 180000 samples, 15.000 seconds
Block size = 1920
Subblock size = 960
N_FFT = 3840
Max magnitude: -16.9 dB
000000  36 +1.44 1953 ~  JH1AJT RK6AH R+07
000000  30 +2.00 2728 ~  CQ DX IK0YVV JN62
000000  29 +1.52 1034 ~  CQ EA3UV JN01
000000  29 +1.92  809 ~  UA9LL SQ8OHR -10
000000  28 +1.52 1722 ~  CQ SM7HZK JO76
000000  25 +1.68 1484 ~  SP8NFO PA3EPP +04
000000  25 +1.76  519 ~  CQ PC2J JO22
000000  24 +0.16  972 ~  JA2GQT SP7XIF JO91
000000  24 +2.08 1406 ~  RK6AUV SV1GN -18
000000  23 +1.60 1669 ~  CQ PB5DX JO22
000000  21 +1.52 2031 ~  JL1TZQ R3BV R-18
Decoded 11 messages
./decode_ft8 tests/191111_110630.wav  0.08s user 0.01s system 99% cpu 0.096 total

After this change:

$ make; time ./decode_ft8 tests/191111_110630.wav
Sample rate 12000 Hz, 180000 samples, 15.000 seconds
Block size = 1920
Subblock size = 960
N_FFT = 4096
Max magnitude: -16.7 dB
000000  31 +1.92  866 ~  UA9LL SQ8OHR -10
000000  30 +1.52 1106 ~  CQ EA3UV JN01
000000  29 +2.00 2912 ~  CQ DX IK0YVV JN62
000000  29 +1.44 2084 ~  JH1AJT RK6AH R+07
000000  28 +2.08 1500 ~  RK6AUV SV1GN -18
000000  25 +1.60 1784 ~  CQ PB5DX JO22
000000  22 +1.52 1838 ~  CQ SM7HZK JO76
000000  22 +1.68 1588 ~  SP8NFO PA3EPP +04
000000  21 +0.16 1041 ~  JA2GQT SP7XIF JO91
000000  21 +1.60 2166 ~  JL1TZQ R3BV R-18
000000  20 +1.76  553 ~  CQ PC2J JO22
000000  14 +1.04 1191 ~  CQ JR5MJS PM74
Decoded 12 messages
./decode_ft8 tests/191111_110630.wav  0.09s user 0.00s system 99% cpu 0.096 total

We get one extra decode which isn't bad for a small change.

CC @kgoba for visibility.

kholia commented 2 years ago

We get one extra decode which isn't bad for a small change.

In other tests (websdr_test9.wav), the results are actually negative (less decodes)!

kgoba commented 2 years ago

I believe I failed to communicate properly why the FFT size is important there. At 12 kHz sampling rate, FFT size 3840 gives you bins that are spaced at 3.125Hz. Since FT8 signals are FSK with tone deviation of 6.25 Hz, you see why I would choose such bin spacing. With FFT size 4096, it doesn't make sense. Of course you might get some sporadic better decodes, but the overall quality should go down. So what I suggested to @howard0su was to choose sampling rate accordingly if 2^N FFT size is necessary. There is no benefit in using FFT size of 2^N in terms of processing quality, only that it's better supported for small build targets.

kgoba commented 2 years ago

Not that the library does not have any requirements for the sampling rate itself. It only expects the user to prepare FFT (STFT) data. Since that is a process that can be organized differently on desktop vs embedded platforms, I have not included it in the library itself, but still left it only as an example/template in decode_ft8.c. To use the decoder on an STM32 board that I have, I had to adapt quite a few changes in the DSP code to use the CMSIS DSP routines, do resampling from 8000 Hz to 6400 Hz, filtering, etc, as well as live audio capture from a codec.

12 kHz sample rate was chosen since it's the same as WSJTX uses, so it's convenient to reuse the wave files that wsjtx saves. There is nothing magic about it otherwise.