savbell / whisper-writer

💬📝 A small dictation app using OpenAI's Whisper speech recognition model.
GNU General Public License v3.0
275 stars 46 forks source link

Multilingual support #2

Closed YeapGuy closed 1 year ago

YeapGuy commented 1 year ago

Hi! Thanks for creating this! I'm lucky I found it, as I was looking for a program that does exactly this for a while now.

How does the language code in the options affect the output? I know Whisper's multilingual models can transcribe input without specifying its language. So if I say one sentence in one language and another sentence in a second language, Whisper can transcribe both in their respective languages. And this is what I need as well. I speak multiple languages and use them all on my PC. The language code option seems to restrict this use case though. Or does it? I don't really know.

Could you please elaborate on how this works and if/how I can achieve what I want?

Thanks!

savbell commented 1 year ago

Hi, thanks for your comment and for bringing multilingual use cases to my attention!

I did some testing today to see how the Whisper API would handle different languages being used in the same audio file and it does seem to have support for this. It was able to recognize when I send sentences alternating between English and French as well as when I added French phrases in the middle of English sentences. This happened even when I had the language configuration set to English only. So, to answer your question – no code changes are needed to achieve the functionality you're after!

However, I did just make some changes to remove the default language setting in case it was negatively impacting use cases like this. It can still be set but will be null by default.

Just for reference, according to the API documentation, the supported languages are: Afrikaans, Arabic, Armenian, Azerbaijani, Belarusian, Bosnian, Bulgarian, Catalan, Chinese, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, Galician, German, Greek, Hebrew, Hindi, Hungarian, Icelandic, Indonesian, Italian, Japanese, Kannada, Kazakh, Korean, Latvian, Lithuanian, Macedonian, Malay, Marathi, Maori, Nepali, Norwegian, Persian, Polish, Portuguese, Romanian, Russian, Serbian, Slovak, Slovenian, Spanish, Swahili, Swedish, Tagalog, Tamil, Thai, Turkish, Ukrainian, Urdu, Vietnamese, and Welsh.

I hope this helps! If you do end up having any multilingual issues, please let me know and we can do some more investigation! 😊

YeapGuy commented 1 year ago

Thank you so much for the response, and of course, for creating WhisperWriter! I just set it up (took me a moment, but only because I wanted to get my GPU set up to run the local model - and even that was super simple (and well worth it, transcribing always takes a second or two, even when I speak for veeery long))

This is amazing! Exactly what I've been looking for. Now I just need to figure out how to get it to always run in the background. 😄

Thanks again ❤️, this will help me so much in daily life.

savbell commented 1 year ago

I'm so glad that you were able to get it working! Your comment absolutely made my day. I made this app for my own personal use because I have a disability that impacts my ability to type on a keyboard so I need a desktop speech-to-text tool, but the built-in Windows ones were buggy and didn't work so I just made my own. I wasn't expecting others to discover and benefit from it too – but I'm very happy that you did! 🥰

If you don't mind, could I ask which settings you changed for the local model? My computer isn't very powerful so I've mostly been using the API since it's a lot faster (I actually only added the local model at someone else's request, haha). So if there's changes you think I could make that would make it easier for someone to set up the local model, I'd love to make them! I assume you changed the device settings to get it to run on your GPU so I just added that into the config files now too: https://github.com/savbell/whisper-writer/commit/3d3a8daf8a598a39f4b1c383fcf7afd8955d00c6

Also, if you're using Windows, what I did since I run it frequently is create a small batch script that starts the Python program and then added a shortcut with the name and icon!

Thanks so much! 😊

YeapGuy commented 1 year ago

The local model worked straight away, but on my CPU. To get it to run on CUDA (GPU), I had to uninstall the regular torch (pip uninstall torch and pip cache purge), and install a version that supports CUDA (I did that with pip install torch --index-url https://download.pytorch.org/whl/cu118 - found that command in Whisper official repo's discussions). And then it just worked, as Whisper automatically uses the best device available and compatible - no need for config changes (although the option you added might be useful for anyone that wants to use their CPU even if they have a compatible GPU).

In the discussion I linked people mentioned installing drivers and CUDA toolkit 11. I didn't need to do that, probably because I installed everything before already - I don't remember, but I probably did.