shashikg / WhisperS2T

An Optimized Speech-to-Text Pipeline for the Whisper Model Supporting Multiple Inference Engine
MIT License
269 stars 23 forks source link

Reproducing benchmarks #11

Open sanchit-gandhi opened 6 months ago

sanchit-gandhi commented 6 months ago

Hey @shashikg! Thanks for your awesome work on this repo - it's a very cool compilation of the various Whisper implementations 🙌

I'm working on the Hugging Face implementation, and keen to understand better how we can reproduce the numbers from your benchmark. In particular, I'm looking at reproducing the numbers from this table.

The benchmark scripts currently use a local version of the Kincaid dataset: https://github.com/shashikg/WhisperS2T/blob/8e0b338078a37625ec6a5912c3702f30009e0ece/scripts/benchmark_huggingface.py#L79

Would it be possible to share this dataset, in order to re-run the numbers locally? You could push it as a Hugging Face Audio dataset to the Hugging Face Hub, which should be quite straightforward by following this guide: https://huggingface.co/docs/datasets/audio_dataset

Once we can reproduce the runs, we'd love to work with you on tuning the Transformers benchmark to squeeze out extra performance that might be available

Many thanks!

shashikg commented 6 months ago

Hey @sanchit-gandhi !

You can prepare the benchmark env using this script: https://github.com/shashikg/WhisperS2T/blob/main/prepare_benchmark_env.sh. This will download the required datasets.

Please also check these numbers for distil-whisper: https://github.com/shashikg/WhisperS2T/releases/tag/v1.1.0

Once we can reproduce the runs, we'd love to work with you on tuning the Transformers benchmark to squeeze out extra performance that might be available

Sure, would love to!

BBC-Esq commented 6 months ago

Hey @sanchit-gandhi , despite our prior correspondence regarding the "insanely" faster whisper, I've tested this awesome library and it's accurate. It actually is faster than anything I've ever tested, including "insanely" insane faster whisper. I say this with all humility considering our prior correspondence, but this is apparently what batch processing + ctranslate2 can do. It has its kinks, like timestamps, for example. But if you find different results please feel free to share. Finally, an "apples to apples" comparison as I was lamenting about in our previous correspondence.

As a matter of integrity, please confirm if/when you CONFIRM the results from this repository as well, as I'm assuming that you have an interest in the truth as opposed to "who's is better" kind of metality. Thanks.

BBC-Esq commented 6 months ago

@sanchit-gandhi two weeks and not confirmation about the results? ...Hmm...is your interest really in "verifying" the accuracy of the results or promoting Huggingface's ~thousands of stars for the "insanely" insane,. absolutely insane and pure insanity...that is...totally insane! Whisper. C'mon man. Admit the metrics and let's move forward.

I also like how you tried to recruit this fellow to come work with Huggingface. Sheesh. Will the egos never stop...