Configuring Language Settings for STT Models in the Amica Project

semperai / amica

Amica is an open source interface for interactive communication with 3D characters with voice synthesis and speech recognition.

https://heyamica.com

MIT License

581 stars 92 forks source link

Configuring Language Settings for STT Models in the Amica Project #112

Closed zhy844694805 closed 2 weeks ago

zhy844694805 commented 3 weeks ago

I have added my locally deployed faster-large-v3 STTcurl, and it can transcribe Chinese. However, when I call the API to integrate it into the Amica project, it directly transcribes the input Chinese into English. I would like to know where I can disable this setting that transcribes into English.

slowsynapse commented 2 weeks ago

This sounds very interesting. May I know your current LLM setup? We should investigate. I'm curious what is the best TTS solution for Chinese?

zhy844694805 commented 2 weeks ago

The best Chinese TTS: ChatTTS. https://github.com/lenML/ChatTTS-Forge API supports OpenAI API-compatible formats. The LLM used is the locally trained Qwen1.5-14b. Regarding the prompt part, I haven't changed anything.

Alright, let's get back to the original topic. I use the locally deployed whisper-v3, and after integrating it with Amica, the transcription always comes out in English. I have tested it and found that only the single-language version, whisper-v2-zh, can transcribe in Chinese. So I am now quite sure that the transcription language settings in Amica for the STT part automatically translate to English.

slowsynapse commented 2 weeks ago

I'll try to identify, and think of some automatic way of detecting this and fixing it.

zhy844694805 commented 2 weeks ago

我会尝试识别并想出一些自动的方法来检测和修复它。

https://github.com/semperai/amica/blob/master/src/features/openaiWhisper/openaiWhisper.ts // Request body const formData = new FormData(); formData.append('file', file); formData.append('model', config('openai_whisper_model')); formData.append('language', 'en'); if (prompt) { formData.append('prompt', prompt); }

According to the code, the input audio file will indeed be transcribed into English. The key lies in the line formData.append('language', 'en'), which explicitly specifies English ('en') as the target language for transcription. Therefore, regardless of the original language of the audio file, the OpenAI Whisper API will attempt to transcribe it into English text.