Closed romanokeser closed 10 months ago
This project is a .NET wrapper around the functionality developed by OpenAI. The developers of Whisper.NET have no control over the quality of the speech recognition. I could suggest to ask for help the participants of the following discussion: https://github.com/openai/whisper/discussions/16
Hello @romanokeser ,
Indeed, @gorokhovskiy is right, this is just a wrapper of whisper.cpp which is a C++ port of OpenAI Whisper, and the models are coming from the open ai.
Besides the issue with the Serbian-Croation languages, I can offer some additional ideas on how to improbe the quality of transcripts (for any language):
Finetune your own model. You can also finetune your own model for a specific language, but that's a little harder:
To answer your questions about WithLanguage("auto")
vs WithLanguage("Croatian")
:
auto will first run the language identification and will detect the language of your audio, transcribing (or translation) will be identical after the language identification phase.
Shorter said: the auto
will just make it a little slower until you will get the first results, but the quality will be the same.
Notes: it can be worse in case auto
is detecting a different language (by error) => e.g. other Slavic language.
I have encountered significant challenges using whisper for speech-to-text conversion in the Croatian language. Unlike English, the system consistently produces inaccurate transcriptions when using Croatian audio inputs. There is no difference in outputs between
CreateBuilder().WithLanguage("auto")
andCreateBuilder().WithLanguage("Croatian")
Any suggestions? Thanks!