Open stefbraun opened 6 years ago
Hi, in most cases lattice based decoding will improve results, it gives better time alignments and allows you to specify a word insertion penalty. It will also give you word confidences.
Does your Pytorch mode follow some of the other recipes, e.g. the Tensorflow ones? We’d be interested to see a comparison between the frameworks.
Florian
On Dec 4, 2017, at 7:23 AM, stefbraun notifications@github.com wrote:
Hi,
I try do decode the output of an acoustic model (CTC) built in pytorch with the eesen framework. On WSJ, I achieve good results with the decode-faster function as in decode_ctc.sh https://github.com/srvk/eesen/blob/4038ad3330b3178e2446cf7a8dc3afe7533fc0ec/asr_egs/wsj/steps/decode_ctc.sh. However, there is a newer decode_ctc_lat.sh https://github.com/srvk/eesen/blob/4038ad3330b3178e2446cf7a8dc3afe7533fc0ec/asr_egs/wsj/steps/decode_ctc_lat.sh using latgen-faster and a scoring script.
What is the difference between these methods? Will lattice-based decoding improve the results? If yes, do you have any numbers for orientation?
Thanks a lot for sharing the eesen framework, this is extremely helpful.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/srvk/eesen/issues/159, or mute the thread https://github.com/notifications/unsubscribe-auth/AEnA8fR2LfOANMXSw4vjzuv6eSqcwzaAks5s8-Q9gaJpZM4Q0kme.
@stefbraun Could you explain how did you use your pytorch trained model with decode-faster ? I have trained a model using CTC loss in tensorflow , I pipe the logits to decode-faster but my WFST only out puts 'wow'. Is there some transformation or a particular way to output the logits to make this work ?
@fmetze sorry for the super-late answer. I cannot share my PyTorch ASR pipeline at the moment, but I wrote up some LSTM benchmarks between PyTorch, TensorFlow, Lasagne and Keras that might be helpful:
Hey @stefbraun, any plans to open source the PyTorch decode-faster pipeline? Non-prefix-beam-search decoding code for CTC, integrated with Pytorch, isn't anywhere people can contribute to right now. I'm working on it and will put it up when I figure it out.
Hi,
I try do decode the output of an acoustic model (CTC) built in pytorch with the eesen framework. On WSJ, I achieve good results with the
decode-faster
function as in decode_ctc.sh. However, there is a newer decode_ctc_lat.sh usinglatgen-faster
and a scoring script.What is the difference between these methods? Will lattice-based decoding improve the results? If yes, do you have any numbers for orientation?
Thanks a lot for sharing the eesen framework, this is extremely helpful.