yajiemiao / eesen

The official repository of the Eesen project
Apache License 2.0
202 stars 72 forks source link

the output of LSTM #13

Open wqn628 opened 8 years ago

wqn628 commented 8 years ago

First, thanks for your help all the time. And I have been being confused by the modeled units all the time .For instance : The unit.txt image And I wonder why we should model the first phone and the second phone ,Actually,both of them don't exist in my training label.Can I delete them and not model them ? any help would be appreciated.

wqn628 commented 8 years ago

@yajiemiao

wqn628 commented 8 years ago

why should we add noise to the lexicon(noises phonemes to the units) ?

yajiemiao commented 8 years ago

if they truly don't exist in your training data, you can safely delete them but caution that by default, Eesen maps OOV words in your training transcripts to

wqn628 commented 8 years ago

Thanks a lot. In addition, the Essen makes model for mono-phone directly, can tri-phones be the model units in essen ? @yajiemiao . As the previous acoustic model(GMM_HMM DNN/LSTM_HMM),the tri-phone have outperformed a lot than mono-phone.

chenzhehuai commented 8 years ago

using tri-phone as the model unit in essen is possible, u might further generate context label (fstcomposecontext) as in HMM system, and replace tri-phone label in T.fst. The final WFST changes into T\circ C\circ LG

wqn628 commented 8 years ago

sorry ,i don't got it. you mean that I should generate the tri-phone by the hybird pipeline or by the commmand ----"fstcomposecontext".the first or the second ?@chenzhehuai

chenzhehuai commented 8 years ago

clustered tri-phone should be generated from hybrid system through clustering; while context in WFST can be generated by fstcomposecontext with extra mapping from tri-phone to clustered tri-phone

yajiemiao commented 8 years ago

An even simpler way is to generate forced alignment with the GMM-HMM, and take the CD states as CI CTC labels. With this, there is no need to consider context dependency in decoding. I didn't do such an experiment, so not sure how this could work in practice.

wqn628 commented 8 years ago

hello,in the stage of decoding, the problem occur as follows:

image can you tell what had happened and how I can solve them? thanks a lot. @yajiemiao