mdangschat / ctc-asr

End-to-end trained speech recognition system, based on RNNs and the connectionist temporal classification (CTC) cost function.
MIT License
121 stars 36 forks source link

Common Voice Dataset #16

Closed ramrahu closed 4 years ago

ramrahu commented 4 years ago

Hi,

I just wanted to know if all the datasets you have used are clean speech? Specifically, wondering about common voice dataset, by any chance have you analyzed the dataset? Since, they have a platform for recording, a mobile app as well as a browser platform, I feel there is a chance that the recordings can be noisy.

Thank you

mdangschat commented 4 years ago

Both TED-Lium and Common Voice contain varying amounts of background noises. The other datasets are pretty consistent quality.

Common Voice FAQ:

We want the Common Voice dataset to reflect the audio quality a speech-to-text engine will hear in the wild, so we’re looking for variety. In addition to a diverse community of speakers, a dataset with varying audio quality will teach the speech-to-text engine to handle various real-world situations, from background talking to car noise. As long as your voice clip is intelligible, it should be good enough for the dataset.