yh1008 / speech-to-text

mixlingual speech recognition system; hybrid (GMM+NNet) model; Kaldi + Keras

70 stars 19 forks source link

Closed yh1008 closed 7 years ago

yh1008 commented 7 years ago

wendywangwwt commented 7 years ago

Chinese speech data from CSLT at TsingHua: http://cslt.riit.tsinghua.edu.cn/resources.php?Public%20data There are 2 databases we may need. (a) SUD-12 database for short utterance: http://data.cslt.org/susr/SUB12/index.html (b) THUCH30 database for Chinese: http://data.cslt.org/thchs30/README.html <-- this database is calling for competition.. interesting

wendywangwwt commented 7 years ago

Mandarin-English Code-Switching in South-East Asia

yh1008 commented 7 years ago

Columbia U is a member of LDC and we get the data from Julia and Brenda (for free)