xenova / whisper-web

ML-powered speech recognition directly in your browser
https://hf.co/spaces/Xenova/whisper-web
MIT License
1.29k stars 152 forks source link

Large model #18

Open mootje2 opened 10 months ago

mootje2 commented 10 months ago

Thanks for the nice afford with this app, I was wondering if I could- use it with the large model because I can see that with the multilanguage the transscription the large model have much better results than the one you are using. I have the large model on my Ubuntu server and test it with Gradio it gives a much better transcription. The question is how to adjust the script the use the large model from my local server?. also I saw in your demo on hugging face there is a microphone I do miss it. Thanks

xenova commented 10 months ago

The purpose of this project is to run whisper directly in your browser, instead of a local server, so, I won't be modifying it to support an external API. However, feel free to clone the repo yourself, then separating the frontend from the backend if you wish to reuse the user interface.

yavuzKomecoglu commented 4 months ago

Hi @xenova, We added it to the models list as 'Xenova/whisper-large': [1550]. I download the model, but I get the error "RangeError: offset is out of bounds" during the transcription phase. I get the same error on devices with these different RAMs. How can I operate the Large model?

midpoint commented 1 week ago

whisper-web\src\components\AudioManager.tsx

    const models = {
        // Original checkpoints
        'Xenova/whisper-tiny': [41, 152],
        'Xenova/whisper-base': [77, 291],
        'Xenova/whisper-small': [249],
        'Xenova/whisper-medium': [776],
        'Xenova/whisper-large-v2': [23776],
        'Xenova/whisper-large-v3': [17776],

        // Distil Whisper (English-only)
        'distil-whisper/distil-medium.en': [402],
        'distil-whisper/distil-large-v2': [767],
    };