misyaguziya / VRCT

VRCT(VRChat Chatbox Translator & Transcription)
https://misyaguziya.booth.pm/items/
MIT License
86 stars 8 forks source link

Improvements #2

Closed FerLuisxd closed 1 year ago

FerLuisxd commented 1 year ago

Hello! This is a really cool project! I have so questions and suggestions Have you looked into open ai's whisper model? I can run locally and transcribe (and translate) voice! But the benefit is that is free! (and probably faster). Also another option for translating text could be using the google actions api (which is free up to 5000 times a day).

misyaguziya commented 1 year ago

Hello. Yes, I know about whisper. I actually implemented it in VRCT and tried it, but the Japanese transcription accuracy is poor and very slow since the processing is done by CPU... (I think you can try by changing recognize_google to recognize_whisper in audio_transcriber.py) GPU is required if you want to do real-time transcription, but I am holding off because it seems difficult to include libraries such as cuda, etc.

Are the translation options [DeepL, Google, Bing] not enough?

Thanks!

misyaguziya commented 1 year ago

I closed it by mistake...

FerLuisxd commented 1 year ago

Hello! Thanks for the response Currently it uses DeepL for Speech to text + translation, right?

I may have it wrong but what are the limitions for DeepL (is it free?)

Edit: Apologies, I just read a little more the README to see that is uses https://github.com/misyaguziya/custom_speech_recognition. Currently what does it use for speech to Text?

misyaguziya commented 1 year ago

I feel that there is a discrepancy in recognition. DeepL only uses translation functionality. We use the Google Speech Recognition API for transcription. You can use DeepL for translation, but you can also choose google or bing. This translation is accessed from the web, so there are no restrictions.