Common Voice Dataset - Githubissues

mdangschat / ctc-asr

End-to-end trained speech recognition system, based on RNNs and the connectionist temporal classification (CTC) cost function.

MIT License

121 stars 36 forks source link

Both TED-Lium and Common Voice contain varying amounts of background noises. The other datasets are pretty consistent quality.

Common Voice FAQ:

We want the Common Voice dataset to reflect the audio quality a speech-to-text engine will hear in the wild, so we’re looking for variety. In addition to a diverse community of speakers, a dataset with varying audio quality will teach the speech-to-text engine to handle various real-world situations, from background talking to car noise. As long as your voice clip is intelligible, it should be good enough for the dataset.

mdangschat / ctc-asr

Common Voice Dataset #16