msqr1 / Vosklet

A speech recognizer that can run on the browser, inspired by vosk-browser
MIT License
24 stars 1 forks source link

evaluation give worse metrics then server-based VOSK. Way to run it sync for test? #3

Closed korabelnikov closed 1 month ago

korabelnikov commented 4 months ago

I have a dataset of syllables on which I perform the evaluation. Each wav = 1 syllable (0.2 sec). Just read one by one, recognize, check with ground true.

Vosk on the host gives ~80% accuracy.

Vosklet give ~65%, Possible it's because async nature of the js wrapper. I use a timeout 2 sec to divide different wav recognition, but 11% of records have nothing as recognized result, and 8% contains 2 syllables.

Other things all the same between hosted Vosk and browser Vosklet. Maybe running in main thread synchronously will help?

msqr1 commented 4 months ago

Running on the browser main thread on the will block it, resulting in browser freeze. Are you sure everything is the same, like double-precision? I can still make you a synchronous wrapper if you want.

msqr1 commented 4 months ago

What browser are you running it on?

korabelnikov commented 4 months ago

@msqr1 yes, the same model was used, same wav file. I didnt change double/float precision excplicitly. I can't address the quality drop to something else but the asynchronous execution,but it may be something else.

I rechecked, accuracy dropped from 82% to 66%, some words missed and some false detections.

Do you have any ideas? I would like to check with main thread version, just to eliminate concurrency reasons

msqr1 commented 4 months ago

@korabelnikov I made a sync version, check the synchronous branch.

tdcook commented 4 months ago

I've noticed similar inaccuracies. I tried the synchronous version but I'm not seeing any result or partialResult events being fired. (I see the results are being returned by acceptWaveform)

I don't know what exactly @korabelnikov is seeing but the synchronous version didn't make much of a difference for the inaccuracies I'm seeing. For example, I have my grammar set to a list of the letters of the alphabet, and when saying a letter like "h" or "q" I'll get results like "a h" or "q u", as if it's triggering a new result halfway through the letter. Also, sometimes the recognition won't return a full result when it should. Previously I was using vosk-browser and I didn't notice this issue.

msqr1 commented 4 months ago

Hmm... I literally have no idea what is wrong? I need some help lol...

tdcook commented 3 months ago

I'd like to help out where I can, but the build process seems complex and I can't figure out how to build Kaldi and Vosk in ways that will make the test script run. Is there some documentation you could point me towards?

msqr1 commented 3 months ago

For the test script, it just build the finished JS in whatever mode I use to test. It is the same as the make script, but the out filename is test.js, with manual decompression to test the tgz models locally (no http header for browser decompression)

korabelnikov commented 3 months ago

sorry guys, we moved toward backend vosk recognition for a while (we will check sync version). @tdcook Have you found something related?

msqr1 commented 1 month ago

Could you guys check if this is fixed? I mess up the return result of acceptWaceform the whole time... @korabelnikov, @tdcook