Open jeremyandrews opened 4 years ago
I requested help/ideas here: https://www.reddit.com/r/rust/comments/etc5sf/library_for_sound_normalization/
I started trying to get the rrnoise denoising library working, but was unable to get a basic example working (yet): https://github.com/RustAudio/rnnoise-c/pull/1
Here's a basic working example with rnnoise, although the end result is raw audio which I'm not sure yet how to use: https://github.com/RustAudio/rnnoise-c/pull/2
Audio received from the Kakaia client is often too quiet for deepspeech to properly process, normalization could help a lot. However, often there's a loud sound at the very end of the clip (probably from the finger touching the screen to stop the recording) that would also first have to be detected and trimmed, as otherwise normalization doesn't do anything.
Need to select one or more libraries, ideally supporting multiple audio formats (at least FLAC and WAV). The plan is to handle realtime streams, so perhaps we can do chunked normalization (or is real-time normalization a thing?) which should also solve the loud sound at the end.