jitsi / gsoc-ideas

Google Summer of Code ideas
59 stars 29 forks source link

Speech-to-text GSOC project ! discussion #23

Closed Apoorvgarg-creator closed 5 months ago

Apoorvgarg-creator commented 2 years ago

@nikvaessen, I was studying the backend implementation 'Jigasi'. And there was a Heading Vosk Configuration, I read about this and found out that Vosk is an Open Source speech recognition system. So In this Project We will have to find a different Open Source Model other than Vosk ?if yes, What are the properties that Vosk is lacking behind. It would help in finding the correct Open source model as I was searching there are many other models. like deepspeech by Mozilla, OpenSeq2Seq by NVIDIA.

nikvaessen commented 2 years ago

One possibly direction of this project is to 1) add documentation about setting up Vosk, and 2) potentially build on top of/improve the models Vosk provides. Currently. I think Vosk is build on top of the Kaldi framework. It would be nice if we could leverage more state-of-the-art solutions for our speech-to-text feature.

Apoorvgarg-creator commented 2 years ago

@nikvaessen I studied about Vosk and tried implementing and understanding its functionality and saw some example present in vosk-api and kaldi framework was used as you stated. I also used the kaldi, deepspeach, and Vosk on the same audio file to compare results.

nikvaessen commented 2 years ago

Great! I hope you had fun - and have a way of including that in your proposal. Have you also had a look at Jigasi?

Apoorvgarg-creator commented 2 years ago

Great! I hope you had fun - and have a way of including that in your proposal. Have you also had a look at Jigasi?

It was a fun activity, read so many articles to know and understand what parameter they consider while comparing Speech-to-text model. Most article pointed out "WER - Word Error rate", So I have considered this and computational time it took to process the same audio file. Yes, I have taken a look at jigasi as well. like it creates a service with 'Google Speech-to-text Api' by default, have taken a look at gateway service handler as well.