msqr1 / Vosklet

A speech recognizer that can run on the browser, inspired by vosk-browser
MIT License
33 stars 1 forks source link

Default model within library? #12

Closed YuryKonvpalto closed 2 months ago

YuryKonvpalto commented 2 months ago

If I call russian model in script (see below) it anyway takes en eng model.

let module = await loadVosklet() let model = await module.createModel("./vosk-model-small-ru-0.22.zip", "model", "ID") let recognizer = await module.createRecognizer(model, 16000)

Even if leave it without path to model at all, it takes eng model: let model = await module.createModel("./", "model", "ID")

Looks like the eng model is within Vosklet.min.js script and is always taken by default. Any suggestion pls?

msqr1 commented 2 months ago

First, please use the latest version, 1.1.1. Second, the model constructor overload models based on ID or storage path, not URL. If you want fast startup after the first time with both russian and english model on the client and just switch between folders, you can do await module.createModel(URL, language == "en" ? "en-model" : "ru-model","ID"). Or if you want to re-fetch every time user select a different one to save on storage space, you would do await module.createModel(URL, "model", language == "en" ? "en" : "ru"). Lastly, it seems that you're using a .zip model. Vosklet only works with .tar.gz, a .zip will cause errors.

YuryKonvpalto commented 2 months ago

Thanx a lot for a promt feedback. Could ypu pls provide a link to ru .tar file on https://ccoreilly.github.io ? I'll try to follow your Example on https://github.com/msqr1/Vosklet/blob/main/Examples/fromWav.html with ru .tar.

msqr1 commented 2 months ago

Look here to find what model you want: https://github.com/ccoreilly/vosk-browser/tree/master/examples%2Freact%2Fpublic%2Fmodels. Choose one, and attach it to https://ccoreilly.github.io/vosk-browser/models/ for the URL. Does this solve the problem for you? @YuryKonvpalto

YuryKonvpalto commented 2 months ago

Thanks for link. But it just doesnt work.:) Now it says "Untar: Incorrect tar format, must be USTAR". If I switch back to eng version (it worked just a minute ago) - it continues to throw error - "Untar: Incorrect tar format, must be USTAR". I open my html with script in VSCode with LiveServer.

This is my tiny and simple code index.html. Try to start it with en-model and then with ru-model. In best case you would get en-model transcription anyway. Otherwise you will get some error:

///// I have loaded your lasr version from github and placed in my folder
msqr1 commented 2 months ago

Again, change in URL doesn't update the stored model. You have to change ID or storepath. All of this is documented in https://github.com/msqr1/Vosklet/blob/main/API.md. To assist you in managing this you should use the OPFS explorer extension to inspect the loaded models. Also, the russian model URL looks kinda wrong, isn't it vosk-model-small-ru-0.4.tar.gz, and not vosk_model_small_ru_0_4_tar_gz. Lmk if it worked for you

YuryKonvpalto commented 2 months ago

Again, change in URL doesn't update the stored model. You have to change ID or storepath. To assist you in managing this you should use the OPFS explorer extension. Also, the russian model URL looks kinda wrong?

Yes, russian model link is not correct, but I think it's not the case. I'll try to use OPFS, like you have suggested. I'll keep ypu informed if you are interested). Thanx for a tip again.

YuryKonvpalto commented 2 months ago

Yes, the Vosklet works! Thanks for pointing to OPFS - very usefull thing. The models are outdated a bit though. Vosk has updated all the models on their website, but they are all in zip-format. On the otherhand, yuor models are smaller. Could you pls explain in few words a difference between partil result and result? Partial - it takes a chunk of audio? And does Vosklet transcribe mp3?

msqr1 commented 2 months ago

For the zip format, you kinda have to download and host it yourself in .tar.gz because I used DecompressionStream of the browser that only supports gzip.

When silence is detected, a result is fired. That result won't change again. If silence is not detected (you're still talking), you will get a partial result that might change, especially the last word of the partial result.

Since this is just a Vosk wrapper that mimics Vosk's API, a lot of information about its behavior is over on Vosk's repository, you might want to check it out.