wenet-e2e / wenet

Production First and Production Ready End-to-End Speech Recognition Toolkit
https://wenet-e2e.github.io/wenet/
Apache License 2.0
4.14k stars 1.07k forks source link

tlg runtime decode question #1087

Closed Yymax-max closed 2 years ago

Yymax-max commented 2 years ago

for (int i = 0; i < nbest; i++) { kaldi::LatticeWeight weight; std::vector alignment; fst::GetLinearSymbolSequence(nbestlats[i], &alignment, &outputs[i], &weight); ConvertToInputs(alignment, &inputs[i], &times[i]); RemoveContinuousTags(&outputs[i]); likelihood[i] = -weight.Value2(); }

in ctc_wfst_beamsearch.cc why likelihood[i] = -weight.Value2() is not likelihood_[i] = -(weight.Value1()+weight.Value2())?

robin1001 commented 2 years ago

It should be likelihood_[i] = -(weight.Value1()+weight.Value2()) if we want the sum of acoustic score and lm score.

Yymax-max commented 2 years ago

It should be likelihood_[i] = -(weight.Value1()+weight.Value2()) if we want the sum of acoustic score and lm score.

thank you for your answer ,i approve your answer, but why ctc_wfst_beamsearch.cc is likelihood[i] = -weight.Value2(),-weight.Value2() is likely acoustic score , only add it ,what the mean of the tlg

robin1001 commented 2 years ago

https://github.com/wenet-e2e/wenet/pull/1116

Please see the PR, we use both acoustic and graph cost now.