Closed ramrahu closed 4 years ago
Both TED-Lium and Common Voice contain varying amounts of background noises. The other datasets are pretty consistent quality.
We want the Common Voice dataset to reflect the audio quality a speech-to-text engine will hear in the wild, so we’re looking for variety. In addition to a diverse community of speakers, a dataset with varying audio quality will teach the speech-to-text engine to handle various real-world situations, from background talking to car noise. As long as your voice clip is intelligible, it should be good enough for the dataset.
Hi,
I just wanted to know if all the datasets you have used are clean speech? Specifically, wondering about common voice dataset, by any chance have you analyzed the dataset? Since, they have a platform for recording, a mobile app as well as a browser platform, I feel there is a chance that the recordings can be noisy.
Thank you