Atticus1806 commented 11 months ago

I am trying to reproduce the whisper results for TEDLIUM with the provided run_whisper.sh script.

What I noticed is, that for TEDLIUM when I run e.g. whisper tiny the model crashes at some point due to:

RuntimeError: The expanded size of the tensor (3000) must match the existing size (3254) at non-singleton dimension 1. Target sizes: [80, 3000]. Tensor sizes: [80, 3254]

I assume this is because the sequence is longer than 30 seconds, but for some reason with batching this can't be handled. While this is probably a whisper side problem I am wondering how was the evaluation done here? Was it reduced to batch_size=1? Then the evaluation works, but the WER I receive is off by 0.03.How can this happen, since the other evaluations I ran this far for other datasets matched.

Jiltseb commented 7 months ago

@sanchit-gandhi Yes, I think the code base needs to be updated. Even when I tried with the latest transformers, it still has the above problem. Ideally, it should work for higher batch size as well as the chunking is done internally.

When trying evaluations, I also got the error: "#ValueError: Multiple languages detected when trying to predict the most likely target language for transcription. It is currently not supported to transcribe to different languages in a single batch. Please make sure to either force a single language by passing language='...' or make sure all input audio is of the same language. " I had to specify the language manually, which we don't need to do in whisper models. I also don't understand why the language detection was so inaccurate and raised this error.

I am also getting frequent disconnected during evaluations,: ".../open_asr_eval/lib/python3.11/site-packages/datasets/download/streaming_download_manager.py", line 351, in read_with_retries

raise ConnectionError("Server Disconnected") from disconnect_err"

gcervantes8 commented 7 months ago

I am also getting the same error in 4 of the datasets. This is when running using the open_asr_leaderboard/transformers/run_whisper.sh script.

huggingface / open_asr_leaderboard

Tedlium Evaluations for Whisper #17

raise ConnectionError("Server Disconnected") from disconnect_err"