jasonppy / PromptingWhisper

Promting Whisper for Audio-Visual Speech Recognition, Code-Switched Speech Recognition, and Zero-Shot Speech Translation
132 stars 11 forks source link

Can you provide me with the details of the decoding process? #2

Closed yuhangear closed 10 months ago

yuhangear commented 1 year ago

Hi,

I attempted to use Hugging Face's code to decode with the Whisper-Large model for SEAME. In comparison to the decoding results for Whisper-Default mentioned in Table 4 of the paper,

there seems to be a discrepancy between our results, possibly due to differences in language identification. Your results are devman: MER 51.55 and devsg: MER 61.36, while my results are devman: MER 38.2 and devsg: MER 65.0. Were these results also obtained using greedy search for decoding?

jasonppy commented 12 months ago

Sorry for the late reply. The results are obtained by using beam search with beam size equals to 5