srvk / eesen

The official repository of the Eesen project
http://arxiv.org/abs/1507.08240
Apache License 2.0
822 stars 342 forks source link

what is the difference between tokens.txt and units.txt #97

Closed zhangjiulong closed 7 years ago

zhangjiulong commented 7 years ago

whichi file to use when decoding to get phone list.

fmetze commented 7 years ago

Do you want the sequence of phones that the acoustic model would compute without a language model? The same that is computed during validation in the training step?

tokens.txt contains the tokens of the FST, with blank and disambiguation tokens, while units.txt contains the list of phones.

zhangjiulong commented 7 years ago

as in the issues98 why units.txt do not contains blank id?

fmetze commented 7 years ago

the blank is implicit - you don't specify it, it appears between all phones, but it is not a unit that gets used in the computation of the objective function.

zhangjiulong commented 7 years ago

Thanks for your replay, I will try to understand.