microsoft / MS-SNSD

The Microsoft Scalable Noisy Speech Dataset (MS-SNSD) is a noisy speech dataset that can scale to arbitrary sizes depending on the number of speakers, noise types, and Speech to Noise Ratio (SNR) levels desired.
MIT License
456 stars 141 forks source link

Real time noise suppression #12

Open sberryman opened 4 years ago

sberryman commented 4 years ago

Excellent article on VentureBeat today: https://venturebeat.com/2020/04/09/microsoft-teams-ai-machine-learning-real-time-noise-suppression-typing/

Funny enough I've used this dataset (which I'm assuming you are referring to in the article) to also train noise suppression. I didn't have a requirement for real-time/streaming so I used a bidirectional LSTM recurrent layer. I also trained against Librispeech (technically LibriTTS as I wanted 24hz audio.)

Examples

Sourced from national news broadcasts to show performance against data it was NOT trained on. Audio files are compressed as GitHub doesn't allow raw waveform upload. I've provided the source files from the broadcast with _noisy.wav suffix and the predicted output from the network with the _clean.wav suffix.

Example 1

sequence 1585584_clean sequence.1585584_.zip

Example 2

sequence 1597540_clean sequence.1597540_.zip

Example 3

sequence 1046182_clean sequence.1046182_.zip

Example 4

sequence 1597377_clean sequence.1597377_.zip

Example 5

sequence 231_clean sequence.231_.zip

Example 6

Not the best but still did a decent job suppressing a noise sample it was never trained against. 00049 unknown and_despite_that_and_despite_40_million_18_trump_haters_including_people_that_worked_for_hillary_clinton_and_some_of_the_worst_human_beings_on_earth_they_got_nothing_clean trump_helicopter.zip

sberryman commented 4 years ago

I should clarify that I started by masking and later decided to go a different route. In all the provided examples the output from the network is recombined with the source phase when passing to istft. Masking left much more of the background noise.