xenova / whisper-web

ML-powered speech recognition directly in your browser
https://hf.co/spaces/Xenova/whisper-web
MIT License
1.29k stars 152 forks source link

Word-level-timestamps #17

Open christopher-kapic opened 10 months ago

christopher-kapic commented 10 months ago

Hi Xenova, great work with this repo. Do you know if it's possible to get word-level-timestamps with this? I know it's possible if I'm running whisper in the terminal, but I'm not sure if that functionality extends to this browser/huggingface version, and I don't know how to find out. If you're not sure, feel free to let me know and close the issue.

xenova commented 10 months ago

It's definitely possible; I just haven't gotten around to updating the user interface 😅

Here's example code for it (docs):

let url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/jfk.wav';
let transcriber = await pipeline('automatic-speech-recognition', 'Xenova/whisper-tiny.en', {
    revision: 'output_attentions',
});
let output = await transcriber(url, { return_timestamps: 'word' });
// {
//   "text": " And so my fellow Americans ask not what your country can do for you ask what you can do for your country.",
//   "chunks": [
//     { "text": " And", "timestamp": [0, 0.78] },
//     { "text": " so", "timestamp": [0.78, 1.06] },
//     { "text": " my", "timestamp": [1.06, 1.46] },
//     ...
//     { "text": " for", "timestamp": [9.72, 9.92] },
//     { "text": " your", "timestamp": [9.92, 10.22] },
//     { "text": " country.", "timestamp": [10.22, 13.5] }
//   ]
// }

But if anyone else wants to update the UI to add this option, then go for it! :)

christopher-kapic commented 10 months ago

You're awesome, thank you!