Open sberryman opened 4 years ago
I should clarify that I started by masking and later decided to go a different route. In all the provided examples the output from the network is recombined with the source phase when passing to istft. Masking left much more of the background noise.
Excellent article on VentureBeat today: https://venturebeat.com/2020/04/09/microsoft-teams-ai-machine-learning-real-time-noise-suppression-typing/
Funny enough I've used this dataset (which I'm assuming you are referring to in the article) to also train noise suppression. I didn't have a requirement for real-time/streaming so I used a bidirectional LSTM recurrent layer. I also trained against Librispeech (technically LibriTTS as I wanted 24hz audio.)
Examples
Sourced from national news broadcasts to show performance against data it was NOT trained on. Audio files are compressed as GitHub doesn't allow raw waveform upload. I've provided the source files from the broadcast with
_noisy.wav
suffix and the predicted output from the network with the_clean.wav
suffix.Example 1
Example 2
Example 3
Example 4
Example 5
Example 6
Not the best but still did a decent job suppressing a noise sample it was never trained against.
trump_helicopter.zip