wenet-e2e / wenet

Production First and Production Ready End-to-End Speech Recognition Toolkit
https://wenet-e2e.github.io/wenet/
Apache License 2.0
4.14k stars 1.07k forks source link

Timestamps. #604

Closed Freddy-pp closed 3 years ago

Freddy-pp commented 3 years ago

Hi.

First, I want to thank you for the great work.

I use GigaSpeech pretrained model. I successfully create LM and TLG.fst decoding graph. I run ./tools/decode.sh with wfst_decode_opts and correct fst_path. I'm getting correct decoding/rescoring partial and final results in log file and in result file, but I don't get any timestamp info in log or result files. I even try to run decoder_main with unit_path key (taking words.txt from pretrained model as input), but nothing changes.

Could you please tell me, how can I get the timestamps for the words?

And is there any way to get lattice as in Kaldi instead of text when using TLG graph for decoding?

Thanks in advance.

pengzhendong commented 3 years ago
  1. The timestamp only works in websocket.
  2. You could save the lattice or compact lattice after this statement: https://github.com/wenet-e2e/wenet/blob/d3f690a889e8b5aea3eabb5dcf39bb48d754d2ab/runtime/core/decoder/ctc_wfst_beam_search.cc#L135
Freddy-pp commented 3 years ago

Thank you. Do I understand correctly that I can add the output of timestamps by adding something like this link to the decoder_main.cc?

pengzhendong commented 3 years ago

Yes.

Freddy-pp commented 3 years ago

Thank you very much!