wenet-e2e / wenet

Production First and Production Ready End-to-End Speech Recognition Toolkit
https://wenet-e2e.github.io/wenet/
Apache License 2.0
4.18k stars 1.08k forks source link

More details about RNN-T results. #1649

Closed lomizandtyd closed 1 year ago

lomizandtyd commented 1 year ago

Hi wenet team,

I want to know more details about the RNN-T result.

Just list the last table here:

rnnt ctc att greedy beam rescoring fusion
4.88 4.67 4.45 4.49
5.56 5.46 / 5.40
5.03 4.94 4.87 /
5.64 5.59 / /
4.94 4.94 4.61 /

My question is:

  1. What is ctc beam search used in here? I noticed there's a ctc header inside transducer module. Are you using T's encoder only and CTC header for decoding?
  2. Compare with CTC only result on Conformer, CTC is better with attenstion rescoring, right?
  3. Is "Fusion" decoding with LM? Is that LM a TLG graph or just BPE level LM?
Mddct commented 1 year ago

1 there are three losses in wenet-rnnt training style。 So you can also ues ctc + rescoring for decoding

2

3 fuison here means ctc probs fusion with transducer when decode , not with lm

ruomingp commented 1 year ago

Do you have Conformer-RNNT results on Librispeech? Do they match the numbers from the Conformer paper? Thanks.

robin1001 commented 1 year ago

please see https://github.com/wenet-e2e/wenet/tree/main/examples/librispeech/rnnt, the result is far behind the conformer paper. Maybe @yuekaizhang could show more details.

yuekaizhang commented 1 year ago

please see https://github.com/wenet-e2e/wenet/tree/main/examples/librispeech/rnnt, the result is far behind the conformer paper. Maybe @yuekaizhang could show more details.

@ruomingp This link's model size is about 34Mb. For mid size model in conformer paper, they got 2.3%, 5.0% on librispeech testsets. I have no idea why we can't reach the paper numbers. Maybe they train more epochs or use other techniques. I was wondering if you have some suggestions.

ruomingp commented 1 year ago

Thanks for the info! Have you tried Conformer-L?

yuekaizhang commented 1 year ago

Thanks for the info! Have you tried Conformer-L?

Not yet. FYI, https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/RESULTS.md#large this one has similar size with Conformer-L.

lomizandtyd commented 1 year ago

Thank your guys for the information.