How can I do the decoding only base on the acoustic models where there is no effect of language models in EESEN ?

srvk / eesen

The official repository of the Eesen project

http://arxiv.org/abs/1507.08240

Apache License 2.0

822 stars 343 forks source link

How can I do the decoding only base on the acoustic models where there is no effect of language models in EESEN ? #118

Closed niucheney closed 7 years ago

niucheney commented 7 years ago

Hi, I just want to know which frame the CTC tookit (EESEN) chooses when decoding. Can EESEN output it? I am looking forward to your reply. Best wishes!

fmetze commented 7 years ago

You can simply look at the output of net-output-extract, which is a matrix of number of frames by number of tokens. You can see it being passed to the decoder (which applies the language model) in decode_ctc_lat.sh (or other decoding scripts) as a pipe, where you can store it or otherwise process it (convert to text from using copy-feats or so).

fmetze commented 7 years ago

Look at ErrorRateMSeq in https://github.com/srvk/eesen/blob/blank_scale_and_parallel_models/src/net/ctc-loss.cc, you can call train-ctc-parallel in validation mode with the "--sequence-out-file" option, which will give you a file that contains the IDs, temporal locations, and confidences of the peaks. This is what you want, right?

We'll integrate this into the main branch soon (hopefully).

niucheney commented 7 years ago

@fmetze Thank you very much for the prompt reply. The function of the option is very cool !