Model fine tunes - Githubissues

cociweb commented 1 year ago

Hello, I've found no information about the plan of the model updates. Currently the initial multilingual fast-whisper models are really useless on my language. the WER is around 35-30% and it seems the basic commands are not recognized such as the turn on/off (in my language). Unfortunately, the recognized text is far behind the minimum viable so it does not reach the intents-> will no execute any task. I've found a guide how to fine-tune the original whisper model. - but I've never tried it. As of now, mozilla is out with Common Voice V15 and I hope the next dataset version will be available soon (around at the end of 2023)....

As I know the original whisper model is more than 1 year old (V2-large will be 1 at the end of 2023). I've found a fine-tuned model in my language (it has better accuracy but still useless) it is pre-trained with Mozilla's CV V13 dataset and can be found on the www. The time between the work of the two foundation is almost a half year. but, if we check the last half year, then it is incredibly improved: So in the last 1 year the dataset has grown 6x bigger on my language :

Additionally, As NabuCasa is the owner of some super new tts voices (Hungarian piper-Berta is much more natural then piper-Imre, piper-Anna, or Mycroft-Diana, anyways). I'm wondering that maybe some additional basic intents can be generated too - or am I thinking wrong?

But I'm really hope other non-english languages are improved as well...

Is there any plan to update these models officially? or any possibility to make the models able to be customized by our own?

+1) for testing/playing purpose unfortunately the download.py always tries to download the official models/configs/vocabs at docker startup. There is no official option to keep/start my custom model - Is there any plan to bring the possibility for optional custom models? +2) Additionally, I've did not find any guide how to compress the models to be *-int8.bin can anybody make a suggestion?

sarpba commented 10 months ago

ct2-transformers-converter --model openai/whisper-large --output_dir whisper-large-v3-ct2 --copy_files tokenizer.json preprocessor_config.json --quantization int8

Valószínű meg kell adni még a modell export könyvtár otvonalát, illetve a vocabulary.json-t manuálisan kell konvertálni txt-be. mondjuk itt https://onlinejsontools.com/convert-json-to-text

Csak az alap large modell megy. Nekem a v2-v3-tól betojik az addon. (látszólag elindul szépen, de ha munkára fognám mindenféle hibát dob, lassan rájövök talán mi a baj, mert elvileg 2 hónapja már kezeli a whisper a large3-at is...

Egyébként a hugging face.n találtam egy jó small modelt, eddig az a legjobb. https://huggingface.co/domebacsi/faster-whisper-small-hu

Ez utóbbi finomhangolt egyébként nálam jobban teljesít, mint a large alap modell. Szeretnék egy medium modellt finomhangolni a CV15-el de nem tudom egyelőre, hogy elég lesz-e a gépem hozzá. Megnézem a leírást amit linkeltél.

cociweb commented 10 months ago

Thank you, @sarpba ct2 command that is the correct answer for the 2nd question!

The linked huggingface repo is also trained on an almost 1 year old (10 months for now) training set (CV13) If I see it right. CV16 is out nowadays. Anyways, the training set multiplication factor is still 3x compared from CV13 to CV15 in HU. (There was no such improvement in Hungarian language compared from CV15 to CV16, but maybe other languages have....)

Ez utóbbi finomhangolt egyébként nálam jobban teljesít, mint a large alap modell. Szeretnék egy medium modellt finomhangolni a CV15-el de nem tudom egyelőre, hogy elég lesz-e a gépem hozzá. Megnézem a leírást amit linkeltél.

The above linked colab notebook won't run on free subscription with medium model. You have 2 opportunities: A) You can buy GPU+additional storage from Google. B) execute it on local machine with proper python (3.10-3.11) configuration + VSCode+Jupyter extension.

If you choose the B-B-B ( :blush:) option, you can also do the training without (nvidia) GPU with only CPU. But please prepare: It can take days (weeks?) with only an i5/i7 CPU on 100% usage.

In the meantime another ticket is opened for the custom models (1st question): https://github.com/rhasspy/wyoming-faster-whisper/issues/10 Lets continue it over there, and close this thread.

rhasspy / models

Model fine tunes #2