Closed martjay closed 1 month ago
Thanks for your interest in improving Vibe! Distilled models should already be supported and listed in the models page. I'm not sure why the reports suggest they're 6x faster—in my tests, the difference isn't even 2x. The speed of the medium model compared to ggml-medium appears to be about the same.
Thanks for your interest in improving Vibe! Distilled models should already be supported and listed in the models page. I'm not sure why the reports suggest they're 6x faster—in my tests, the difference isn't even 2x. The speed of the medium model compared to ggml-medium appears to be about the same.
I don't know, I extracted subtitles from a 1-hour video, which went from the 12 minute Whisper V2 ggml model to 2 minutes. I use this GUI verson. https://github.com/CheshireCC/faster-whisper-GUI
It running VRAM usage is only over 3G.
It running VRAM usage is only over 3G.
Interesting. what was the size of the model? medium?
Can you try this with vibe?
https://huggingface.co/distil-whisper/distil-large-v3-ggml/resolve/main/ggml-distil-large-v3.bin
It running VRAM usage is only over 3G.
Interesting. what was the size of the model? medium?
Can you try this with vibe?
https://huggingface.co/distil-whisper/distil-large-v3-ggml/resolve/main/ggml-distil-large-v3.bin
No support for identifying secondary directories.
I was wrong. I just use models--Systran--faster-distil-whisper-large-v3.
You can copy the model file to the models directory of Vibe
You can copy the model file to the models directory of Vibe
I just downloaded and used ggml-distil-large-v3.bin. 16 minute video recognition time is less than one minute.
ggml-large-v3 8-9 minite.
ggml-large-v3 8-9 minite.
Thanks for sharing. I 'm really surprised.
And comparing to the default model that comes with Vibe, ggml-medium.bin
?
ggml-large-v3 8-9 minite.
Thanks for sharing. I 'm really surprised. And comparing to the default model that comes with Vibe,
ggml-medium.bin
?
If I can use large model, why should I use medium model?
There is another issue with Vibe, which is identifying the type of subtitle file. It should be selected before starting, otherwise there will be the problem of single line subtitle segmentation.
no support for .safetensors. just .bin?
There is another issue with Vibe, which is identifying the type of subtitle file. It should be selected before starting, otherwise there will be the problem of single line subtitle segmentation.
You mean the length of each line right? Maybe we can add 'preset' options in the more options sections. that's good idea.
no support for .safetensors. just .bin?
Only ggml/gguf/bin supported. you can easily convert safetensors with https://github.com/thewh1teagle/vibe/blob/main/docs/MODELS.md#prepare-your-own-models
There is another issue with Vibe, which is identifying the type of subtitle file. It should be selected before starting, otherwise there will be the problem of single line subtitle segmentation.
You mean the length of each line right? Maybe we can add 'preset' options in the more options sections. that's good idea.
What I mean is, you should choose a file type, such as SRT/VTT/TXT, before recognizing speech. Do you understand?
What I mean is, you should choose a file type, such as SRT/VTT/TXT, before recognizing speech. Do you understand?
Yes, I understand. I know what's the issue your'e talking about when creating SRT that it's too long. We may provide option to choose preset options in the 'More options'. Can you open new issue about it? Thanks :)
What I mean is, you should choose a file type, such as SRT/VTT/TXT, before recognizing speech. Do you understand?
Yes, I understand. I know what's the issue your'e talking about when creating SRT that it's too long. We may provide option to choose preset options in the 'More options'. Can you open new issue about it? Thanks :)
Hello! Actually, I have many thoughts. Is it necessary to start a new thread to express them? Look at this. I set the length of each subtitle, but there still appears a very long paragraph. This is what I want to say.
In fact, destil-whisper is very suitable for real-time speech recognition because its speed is really fast. If you can add this function and add translation subtitles, it means there is no language that cannot be understood. And everything is in real time. However, if you are interested in this function, it is still necessary to make each subtitle segment relatively short and distinguish subtitles by several punctuation marks or character counts. It's just that now I still feel there seems to be some problems.
Describe the feature
This model is 6X faster than ggml.