hotg-ai / rune

Rune provides containers to encapsulate and deploy edgeML pipelines and applications
Apache License 2.0
134 stars 15 forks source link

Calibrate FFT for the microspeech #96

Closed kthakore closed 3 years ago

kthakore commented 3 years ago

Re-train our existing models using data from the phone.

kthakore commented 3 years ago

Starting with the microspeech with fft fix. @Michael-F-Bryan might have a better idea here. https://github.com/kthakore/json-eater

!!! If we can test proc blocks in python that is HUGE deal

kthakore commented 3 years ago

Need to make an implementation in Rust (copying over a python function) for microspeech. More notes from @meelislootus .

meelislootus commented 3 years ago

Notes on hotg drive: https://docs.google.com/document/d/1IeJjxcj8VIca_nFGxnNVsbxsuQvGmnI0-Lga0Wy5Tg8/edit#

Overall summary:

Here’s the TF Ops repo with all the parts to the TF spectrogram-computer, in C/C++ implementation: TF spectrogram-computer repo

The steps in the TF spectrogram-computer (they are all sequentially called from frontend.c) are, with links to the relevant code:

  1. Step 1: A windowing function, that chops the incoming audio sample into windows: window.c - this is currently part of sonogram. should not be too difficult to figure out / reverse engineer
  2. Step 2: FFT - applied on each window - this exists in Rust already: fft.cc
  3. Step 3: Filterbank calculations - convert the FFT complex and imaginary parts into energy - filterbank.c (FilterbankConvertFftComplexToEnergy & FilterbankAccumulateChannels)
  4. Step 4: Noise reduction - apply a low pass filter on each of the windows: noise_reduction.c (NoiseReductionApply)
  5. Step 5: Auto gain control - this might be complicated to reimplement, the algorithm is explained in Wang et al. 2016: pcan_gain_control.c (PcanGainControlApply)
  6. Step 6: Logarithmic scaling: log_scale.c (LogScaleApply)

I think a reasonable plan to match the model might be (given that especially step 5 might be quite complicated):

  1. Stage 1: Retrain the TF model with noise reduction and gain control turned off, and match with a Rust proc block that does steps (1) windowing, (2) FFT, (3) filterbank, (6) log scaling - this should be doable with sonogram + some hacking
  2. Stage 2: Match steps (4) noise reduction and (5) gain control in Rust (or we call the C/C++ functions from rust?)
  3. Stage 3: Go back to using the original model (now that the FFT proc block is fully matched)
kthakore commented 3 years ago
meelislootus commented 3 years ago

https://github.com/hotg-ai/rune/compare/calibrate_models#diff-75a3acce5b7dd27594d4febcdc1d3562368ee3d2ab95027c049a91307fb6a389

Michael-F-Bryan commented 3 years ago

It looks like microspeech is good so I'll close this and #113.