wenet-e2e / wenet

Production First and Production Ready End-to-End Speech Recognition Toolkit
https://wenet-e2e.github.io/wenet/
Apache License 2.0
4.14k stars 1.07k forks source link

Random output with WFST decoding #977

Closed Teenuu closed 1 year ago

Teenuu commented 2 years ago

Describe the bug Decoding with WFST LM is working fine with pre-trained models. But with my own model getting random output using LM decoding. Sample out is given below. Please tell what might be the possible issues. uttid Transcription 010010007 yd y yeh y yew yea y yeo 010010041 yi yea yeh yt zoe 010010064 yd y yeh yi qr yea yea yi qgh yi 010010093 yt yea zoe zoggs yields yn y halts y 010010100 yi yi yeh ypu yi yeh yea zog qr y 010010103 yeh yea yeh y 010010144 y yea zoe zoggs yea yd yi 010010231 y yeh yea yeh yeh y y yea 010010281 qr yeo yea yt yi zp yea yl yea yeo ~
To Reproduce Steps to reproduce the behavior:

  1. Used Librispeech s0 recipe
  2. Created my own language model and lexicon and acoustic model
robin1001 commented 2 years ago

Did you get right result when decoding without LM?

Teenuu commented 2 years ago

Did you get right result when decoding without LM?

Yes I got the correct result without LM. I am only facing issue with my own trained model with LM decoding.

robin1001 commented 2 years ago

010010007 yd y yeh y yew yea y yeo 010010041 yi yea yeh yt zoe 010010064 yd y yeh yi qr yea yea yi qgh yi 010010093 yt yea zoe zoggs yields yn y halts y 010010100 yi yi yeh ypu yi yeh yea zog qr y 010010103 yeh yea yeh y 010010144 y yea zoe zoggs yea yd yi 010010231 y yeh yea yeh yeh y y yea 010010281 qr yeo yea yt yi zp yea yl yea yeo The random symbols are your modeling unit?

Teenuu commented 2 years ago

010010007 yd y yeh y yew yea y yeo 010010041 yi yea yeh yt zoe 010010064 yd y yeh yi qr yea yea yi qgh yi 010010093 yt yea zoe zoggs yields yn y halts y 010010100 yi yi yeh ypu yi yeh yea zog qr y 010010103 yeh yea yeh y 010010144 y yea zoe zoggs yea yd yi 010010231 y yeh yea yeh yeh y y yea 010010281 qr yeo yea yt yi zp yea yl yea yeo The random symbols are your modeling unit?

They are not part of model units. These words are there in my lexicon.

NathanJHLee commented 1 year ago

I think it's same problem with it. https://github.com/wenet-e2e/wenet/issues/1673

xingchensong commented 1 year ago

please follow https://github.com/wenet-e2e/wenet/issues/1673