Questions about language support for the speech-to-text project implementation

jitsi / gsoc-ideas

Google Summer of Code ideas

59 stars 29 forks source link

Questions about language support for the speech-to-text project implementation #24

Closed akashsivanandan99 closed 6 months ago

akashsivanandan99 commented 2 years ago

Hi @nikvaessen, what languages is Jitsi looking to add support for? Is the focus going to be on English or on multi-lingual support. Also on a side note: I was browsing through the Wave2vec2 documentation on HuggingFace and saw that sometimes the model would predict acoustically accurate but grammatically incorrect words/sentences. What is the expectation with regards to handling those cases?

Some examples of what I'm referring to taken from HuggingFace's blog post

tayyab-tariq commented 2 years ago

Hi @nikvaessen, my question is the same as @akashsivanandan99, should we cater specifically to English or focus on a multi-lingual model?

nikvaessen commented 2 years ago

Handling the errors shown in the picture you shared can be done with a language model. I expect that using a language model should be out of scope for this project. We should initially focus on English, but it would be nice if our solution could be easily adapted to another language, if desired.

akashsivanandan99 commented 2 years ago

Good point. I was thinking of maybe using a KenLM model with Wave2vec2. Would that be increase the accuracy to a meaningful level though? Also is there any specific format Jitsi requires for a GSOC proposal?

Handling the errors shown in the picture you shared can be done with a language model. I expect that using a language model should be out of scope for this project. We should initially focus on English, but it would be nice if our solution could be easily adapted to another language, if desired.

akashsivanandan99 commented 2 years ago

@nikvaessen Hi, am writing up a proposal for working on this project and had a question. What would be the estimated project size of this idea? I'd like to get it cleared up as it needs to be selected to submit the proposal.

nikvaessen commented 2 years ago

We don't have a specific format, see https://google.github.io/gsocguides/ for generic tips. I think you should aim for 12 weeks as a project length. I'm not sure if the language modeling is feasible with regards to real-time constraints. Maybe the focus of the project should be on providing a transcript only at first.