wenet-e2e / wenet

Production First and Production Ready End-to-End Speech Recognition Toolkit
https://wenet-e2e.github.io/wenet/
Apache License 2.0
4.18k stars 1.08k forks source link

[transducer] decoding strategy #1269

Closed Mddct closed 9 months ago

Mddct commented 2 years ago

Open this to track different decoding strategies of the transducer model. I will update some materials and code snippets here Suggestions welcome and hope to collaborate

Mddct commented 2 years ago
b-flo commented 2 years ago

Hi @Mddct ,

For alignment-length synchronous (ALSD), time-synchronous decodng (TSD) and modified Adaptive Expansion Search (mAES) in ESPnet, please refer to this. The version you linked is missing some fixes and optimizations!

Btw, you can drop 1-step/N-step constrained beam search from your list as it was superseded by AES/mAES!

Mddct commented 2 years ago

Hi @Mddct ,

For alignment-length synchronous (ALSD), time-synchronous decodng (TSD) and modified Adaptive Expansion Search (mAES) in ESPnet, please refer to this. The version you linked is missing some fixes and optimizations!

Btw, you can drop 1-step/N-step constrained beam search from your list as it was superseded by AES/mAES!

Thank you very much for your advice, I'll do some experiments later

Mddct commented 2 years ago

If we want to use one-step decoding in the inference stage, can we try the implementation of this rnhnt loss later? paper implement

github-actions[bot] commented 10 months ago

This issue has been automatically closed due to inactivity.