semperai / amica

Amica is an open source interface for interactive communication with 3D characters with voice synthesis and speech recognition.
https://heyamica.com
MIT License
591 stars 92 forks source link

Support XTTS #38

Open jobobby04 opened 7 months ago

jobobby04 commented 7 months ago

Using https://github.com/daswer123/xtts-api-server is one option to have XTTS support, I've looked at the code though and it relies on the local filesystem to share voice files between the client and server. There is also this https://github.com/coqui-ai/xtts-streaming-server not sure how it would be used though.

Andryusz commented 7 months ago

XTTS is already supported using this simple OpenAI API wrapper server: https://github.com/semperai/basic-openai-api-wrapper. You can find more info about the setup here: https://docs.heyamica.com/getting-started/installation#local-audio

This solution is based on the idea of converting whole sentences to WAV files before sending them back to Amica, so the main downside is that it may introduce some delay, especially for longer sentences. To lower the latency, Amica may further split sentences (at commas), but this results in a bit worse sentence audio cohesion (unnaturally long pauses at commas, each part of the sentence in a bit different tone). There is also this bug in the XTTS api where it writes an incorrect sampling rate in the WAV header, so the played voice is slower and sounds lower than it should.

Because of that, I'm currently working on a dedicated streaming server (live conversion and sending samples) for Amica using XTTS. I already have a working solution with low latency (independent of sentence length), proper lip sync and text progression. The code is still very hacky, with all configuration hardcoded, so I will probably need a week or two before sharing it for testing.