srvk / eesen

The official repository of the Eesen project
http://arxiv.org/abs/1507.08240
Apache License 2.0
824 stars 343 forks source link

script for TEDLIUM release2 #27

Closed jinserk closed 8 years ago

jinserk commented 8 years ago

Hello. First of all, thank you very much for the great work! I'm interested in the training of TEDLIUM, especially for the release2. I guess there aren't no proper language model, so it needs to build my own language model from the corpus and dictionary using irstlm or srilm. Could you tell me how to do it? or any plan to add it? Thanks!

yajiemiao commented 8 years ago

The best way is to use/adapt the language modeling training script from WSJ. https://github.com/srvk/eesen/blob/master/asr_egs/wsj/local/wsj_train_lms.sh

Alternatively here is a script I used to create my customized TED language model. You may modify the directory (e.g., lang_bd, lang_test_bd_tgpr, etc.) train_lms.sh.zip

jinserk commented 8 years ago

Sorry I didn't notice you answered. Thank you so much!