k2-fsa / sherpa

Speech-to-text server framework with next-gen Kaldi
https://k2-fsa.github.io/sherpa
Apache License 2.0
534 stars 107 forks source link

Increase tail padding #472

Closed shaynemei closed 1 year ago

shaynemei commented 1 year ago

Original tail padding is not long enough which causes worse performance compared to icefall decoding:

for example in the tedlium_dev evaluation:

...THE ANNUAL ICE IN WINTER AND IT CONTRACTS IN (SUMMER->SU)
...TO TWENTY FEET OF SEA LEVEL AS IS (GREENLAND->GREENLA)
...TO CARBON BASED FUELS LIKE DIRTY COAL (AND->ON) FOREIGN (OIL->O)
...TO WORK MAKE US MORE SECURE AND HELP STOP (GLOBAL WARMING->LOBAL WAR)
...IS BOTH THAT FREEDOM IS IN AND OF ITSELF (GOOD->*)

Increasing the padding from 0.3 to 1.0 seems to bring the WER back to the same as icefall:

snap 2023-08-31 at 12 52 24 PM

csukuangfj commented 1 year ago

Could you make it a command line argument?

shaynemei commented 1 year ago

Could you make it a command line argument?

Done, set default to 1.0

csukuangfj commented 1 year ago

Thanks!

danpovey commented 1 year ago

I am surprised 0.3 secs is not enough, is there anything about Sherpa that explains this?

On Fri, Sep 1, 2023, 8:48 AM Fangjun Kuang @.***> wrote:

Merged #472 https://github.com/k2-fsa/sherpa/pull/472 into master.

— Reply to this email directly, view it on GitHub https://github.com/k2-fsa/sherpa/pull/472#event-10251819542, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZFLO4ZUMFBJ66Y4RZPJCTXYEWF7ANCNFSM6AAAAAA4GSF3YI . You are receiving this because you are subscribed to this thread.Message ID: @.***>

csukuangfj commented 1 year ago

@shaynemei

Which streaming model are you using and have you tried other values between [0.3, 1.0]?

shaynemei commented 1 year ago

this is the model I used to run the evaluation: https://huggingface.co/Zengwei/icefall-asr-librispeech-streaming-zipformer-2023-05-17

I've also tried 0.5 on some of the problematic utterances (didn't run full evaluation for WER), and got a little bit more BPE units at the end but still missing one or two BPE units.