Support XTTS - Githubissues

XTTS is already supported using this simple OpenAI API wrapper server: https://github.com/semperai/basic-openai-api-wrapper. You can find more info about the setup here: https://docs.heyamica.com/getting-started/installation#local-audio

This solution is based on the idea of converting whole sentences to WAV files before sending them back to Amica, so the main downside is that it may introduce some delay, especially for longer sentences. To lower the latency, Amica may further split sentences (at commas), but this results in a bit worse sentence audio cohesion (unnaturally long pauses at commas, each part of the sentence in a bit different tone). There is also this bug in the XTTS api where it writes an incorrect sampling rate in the WAV header, so the played voice is slower and sounds lower than it should.

Because of that, I'm currently working on a dedicated streaming server (live conversion and sending samples) for Amica using XTTS. I already have a working solution with low latency (independent of sentence length), proper lip sync and text progression. The code is still very hacky, with all configuration hardcoded, so I will probably need a week or two before sharing it for testing.

semperai / amica

Support XTTS #38