sandrohanea / whisper.net

Whisper.net. Speech to text made simple using Whisper Models
MIT License
512 stars 78 forks source link

How to convert whisper model to GGML #81

Closed yakovw closed 1 year ago

yakovw commented 1 year ago

Is there a way to do this in C#?

sandrohanea commented 1 year ago

No, there is no way to do it in C# nor a plan to implement it in the future. Do you see any reason why someone would need to do it in C#?

However, Whisper.net have a downloader, which is using huggingface in order to be easy to get a GGML model (either quantized or not): https://github.com/sandrohanea/whisper.net/blob/main/Whisper.net/Ggml/WhisperGgmlDownloader.cs

Of course, one can just create a whisper GGML model using the instructions here: https://github.com/ggerganov/whisper.cpp/blob/master/models/README.md

But even those, are using python, and are not implemented in C++.

yakovw commented 1 year ago

You're right But I'm trying to convert existing models and can't, and there are new models that are more than there are in the above path I wanted to convert these models https://huggingface.co/Shiry/whisper-large-v2-he/tree/main The script failed Because he don't know how to convert Transformers models

sandrohanea commented 1 year ago

I tried that fine-tuned model using https://github.com/ggerganov/whisper.cpp/blob/master/models/README.md#fine-tuned-models and it was successfully generated:

Check this command:

git clone https://github.com/openai/whisper
git clone https://github.com/ggerganov/whisper.cpp
git clone git clone https://huggingface.co/Shiry/whisper-large-v2-he

python3 ./whisper.cpp/models/convert-h5-to-ggml.py ./whisper-large-v2-he/ ./whisper .

Please, ensure you have installed and enabled git-lfs in order to clone that huggingface repo. Also, make sure you have python3 with torchvision and transformers.

yakovw commented 1 year ago

Wait, if you managed to convert it, then it's best for me to take what you managed, is there a way to get to it?

sandrohanea commented 1 year ago

It's a pretty big file (~3gb) so cannot upload it here, maybe I'll add multiple fine-tuned versions to https://huggingface.co/sandrohanea/whisper.net/tree/main but for now, it's probably faster for you to execute the commands above.

If I'll add them to huggingface (+ add some readme on each one with the attribution for each owner), I can modify the downloader in whisper.net to help download these finetunes as well.

yakovw commented 1 year ago

Well I'll try, if I don't succeed I'll wait for you to upload it... Thanks for bothering to try at all. Let's hope for the best

yakovw commented 1 year ago

It's a pretty big file (~3gb) so cannot upload it here, maybe I'll add multiple fine-tuned versions to https://huggingface.co/sandrohanea/whisper.net/tree/main but for now, it's probably faster for you to execute the commands above.

If I'll add them to huggingface (+ add some readme on each one with the attribution for each owner), I can modify the downloader in whisper.net to help download these finetunes as well.

I've been trying for a long time, all I manage to produce is the next line Usage: convert-h5-to-ggml.py dir_model path-to-whisper-repo dir-output [use-f32] He is in no way willing to convert the model for me

sandrohanea commented 1 year ago

I thought about it and won't put finetunes to the downloader and hugging face: there are too many and we cannot be sure that everything is working as designed). Ofc, if someone wants, they can build them. I think you missed some of the directories in the arguments of convert-h5-to-ggml.py. However, I uploaded this temporary to my one drive: https://1drv.ms/u/s!AnblcWiFT8S5gas2R15cv_PIZxEk6w?e=qd8Dbg.

Please, react to this message so that I can remove it afterwards. (Otherwise, it will be deleted on 2023-06-25)

yakovw commented 1 year ago

Thank you very much, that is very nice of you. I downloaded it, you can delete. I am now checking if it actually works well

I thought about it and won't put finetunes to the downloader and hugging face: there are too many and we cannot be sure that everything is working as designed). Ofc, if someone wants, they can build them. I think you missed some of the directories in the arguments of convert-h5-to-ggml.py. However, I uploaded this temporary to my one drive: https://1drv.ms/u/s!AnblcWiFT8S5gas2R15cv_PIZxEk6w?e=qd8Dbg.

Please, react to this message so that I can remove it afterwards. (Otherwise, it will be deleted on 2023-06-25)