Open uni-manjunath-ke opened 1 year ago
@uni-manjunath-ke
Could you try https://github.com/k2-fsa/sherpa/pull/464 ?
It fixes an issue in sherpa for decoding.
Sure, will try and update you.
I verified it. This fix has definitely improved the WERs (by some delta of around 0.5%). But, still it is around 3 to 4% worser than what we get using icefall's streaming_decode.py on a given dataset. And, I still see them empty hypothesis for some utterances, though it has reduced than earlier. Please suggest if there is anything else. Thank you.
Could it be a difference in volume normalization? E.g. maybe we do volume normalization only in non-streaming mode?
Yes, May be. Is it possible to fix that? Thanks
In addition, even in Icefall streaming variant, we have seen that there is discrepancy between the WERs of decode.py to streaming_decode.py.
Initially, we had even poor WERs. But, after debugging & comparing feature extraction of Sherpa & Icefall, we found there is discrepancy in the feature extraction settings. After setting "fbank_opts" parameters in sherpa/cpp_api/feature-config.cc to be same as that of Icefall ones, our WERs have significantly improved. We can create a pull request with these changes, if required. Please let us know. Thanks @uni-sagar-raikar
Please do!
Sure, I have created a pullrequest at https://github.com/k2-fsa/sherpa/pull/465 Thanks
Pls let us know, if there is any fix for volume normalisation for sherpa. In case, if we find anything, we will share the same with you. Thanks
is your test data of 16000Hz, i,e, the same sampling rate as the training data?
Actually, we use 8000 HZ both for training & testing We change the sherpa code appropriately to handle 8000 Hz and use it.
HI @csukuangfj We are getting empty hypothesis for some audios when decoded using latest sherpa with zipformer-2 for some wav files. However, when we decode the same audio files with Icefall we are getting correct hypothesis.
Further, we amplified the wav files that returned empty transcription (since we noticed that some of these wav files had very low volume), and tried decoding with Sherpa, and we got some hypothesis (instead of empty hypothesis).
Could you please let us know, whether there is any difference in the pre-processing or feat extraction done by Icefall and Sherpa. Is it possible to make both of them to have same pre-processing /feat extrtaction? Or Do we have any work around to preprocess these wave files before passing to sherpa, to resolve this. Thanks