mozilla / DeepSpeech

DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.
Mozilla Public License 2.0
25.44k stars 3.98k forks source link

Create cocktail party data set #1536

Open kdavis-mozilla opened 6 years ago

kdavis-mozilla commented 6 years ago

Create cocktail party data set where the background noise for training comes only from the training data sets, background noise for validation comes only from the validation data sets, and , background noise for test comes only from the test data sets.

SalvorinFex commented 6 years ago

Howdy, I was just linked to this: https://voices18.github.io/ Could it be used to help in this regard?

kdavis-mozilla commented 6 years ago

@SalvorinFex Thanks!

Currently we're planning on using the CC0 audio from freesound along with using many random samples of the training set, at a lower in volume, to create a cocktail party data set. But, the VOiCES Corpus also might be a nice addition.

zaptrem commented 5 years ago

Another Mozilla associated noise reduction project created a large noise dataset that might be useful to you. It’s downloadable at the bottom of this page: https://people.xiph.org/~jm/demo/rnnoise/

zaptrem commented 4 years ago

Are there plans to run reverb filters on the datasets as well? STT might struggle in this area.

tilmankamp commented 4 years ago

https://deepspeech.readthedocs.io/en/v0.7.4/TRAINING.html#augmentation

zaptrem commented 4 years ago

@tilmankamp Thanks! Are the pretrained models trained with reverb and these other augmentations enabled? Also, is the reverb added before or after the audio is mixed with the noise samples?

kdavis-mozilla commented 4 years ago

Models trained on data with such augmentations are in the process of being trained.

zaptrem commented 4 years ago

@kdavis-mozilla Are the just-released 0.8 models trained with these augmentations? Or are those coming in 0.9/1.0?

kdavis-mozilla commented 4 years ago

@zaptrem They are coming later.

zaptrem commented 4 years ago

@kdavis-mozilla Would this noise+dataset be useful to this? https://iqtlabs.github.io/voices/