Calibrate FFT for the microspeech

kthakore commented 3 years ago

Re-train our existing models using data from the phone.

kthakore commented 3 years ago

Starting with the microspeech with fft fix. @Michael-F-Bryan might have a better idea here. https://github.com/kthakore/json-eater

!!! If we can test proc blocks in python that is HUGE deal

kthakore commented 3 years ago

Need to make an implementation in Rust (copying over a python function) for microspeech. More notes from @meelislootus .

meelislootus commented 3 years ago

Overall summary:

The spectrogram-computer function in Python (implemented in C/C++, really) is quite complicated; we probably want to simplify and retrain the model with the simpler version of spectrogram-computer
The spectrogram-computer library sonogram in Rust does not do exactly what we need (misses mel-spectrum) and gives a bit poor access to parameters; we probably want to replace the use of sonogram with (1) our own windowing function, written in Rust, (2) an existing FFT crate
OR we just call the individual C/C++ code steps from Rust, in TF library

Here’s the TF Ops repo with all the parts to the TF spectrogram-computer, in C/C++ implementation: TF spectrogram-computer repo

The steps in the TF spectrogram-computer (they are all sequentially called from frontend.c) are, with links to the relevant code:

Step 1: A windowing function, that chops the incoming audio sample into windows: window.c - this is currently part of sonogram. should not be too difficult to figure out / reverse engineer
Step 2: FFT - applied on each window - this exists in Rust already: fft.cc
Step 3: Filterbank calculations - convert the FFT complex and imaginary parts into energy - filterbank.c (FilterbankConvertFftComplexToEnergy & FilterbankAccumulateChannels)
Step 4: Noise reduction - apply a low pass filter on each of the windows: noise_reduction.c (NoiseReductionApply)
Step 5: Auto gain control - this might be complicated to reimplement, the algorithm is explained in Wang et al. 2016: pcan_gain_control.c (PcanGainControlApply)
Step 6: Logarithmic scaling: log_scale.c (LogScaleApply)

I think a reasonable plan to match the model might be (given that especially step 5 might be quite complicated):

Stage 1: Retrain the TF model with noise reduction and gain control turned off, and match with a Rust proc block that does steps (1) windowing, (2) FFT, (3) filterbank, (6) log scaling - this should be doable with sonogram + some hacking
Stage 2: Match steps (4) noise reduction and (5) gain control in Rust (or we call the C/C++ functions from rust?)
Stage 3: Go back to using the original model (now that the FFT proc block is fully matched)

kthakore commented 3 years ago

Should we make prock_block libraries of these. Users could use these in the their procblock.

meelislootus commented 3 years ago

Michael-F-Bryan commented 3 years ago

It looks like microspeech is good so I'll close this and #113.

hotg-ai / rune