matatonic / openedai-speech

An OpenAI API compatible text to speech server using Coqui AI's xtts_v2 and/or piper tts as the backend.
GNU Affero General Public License v3.0
192 stars 32 forks source link

Fine Tuned xttsv2 #12

Closed ther3zz closed 1 month ago

ther3zz commented 1 month ago

I see that this supports voice cloning but I haven't seen anywhere in the docs that this can be used with a fine tuned xtts v2 model.

Is that currently possible?

matatonic commented 1 month ago

I've thought to add it but I haven't looked into it yet - do you have some references for me so I can quick start? How is a finetuned xtts model normally used?

ther3zz commented 1 month ago

I've thought to add it but I haven't looked into it yet - do you have some references for me so I can quick start? How is a finetuned xtts model normally used?

You'll have to bear with me, I have no idea what I'm doing.

I'm currently running it like: tts-server --model_path /models/model.pth --config_path /models/config.json --use_cuda true --speakers_file_path /models/sample.wav

there's some sample inference code here: https://docs.coqui.ai/en/latest/models/xtts.html#advanced-training

I actually had to modify xtts.py , server.py and the config module to even get xtts-server to use the fine tuned model.

The changes are also relevant to the marytts api since my use case is home assistant

matatonic commented 1 month ago

Do you still use the reference_wav with a custom model?

ther3zz commented 1 month ago

Do you still use the reference_wav with a custom model?

LOL, you know what, probably not needed with a fine tuned model and I didnt even think about it until you mentioned it now. I'm going to test without passing the reference wav to see what happens...

matatonic commented 1 month ago

btw, this looks like a simpler change (model_path, config_path) instead of (model_name)

matatonic commented 1 month ago

I don't have a model to test with...

ther3zz commented 1 month ago

Do you still use the reference_wav with a custom model?

LOL, you know what, probably not needed with a fine tuned model and I didnt even think about it until you mentioned it now. I'm going to test without passing the reference wav to see what happens...

So I got an error when I didnt pass either speaker_idx or speaker_wav so it looks like one of those are required.

ValueError: [!] Looks like you are using a multi-speaker model. You need to define either a `speaker_idx` or a `speaker_wav` to use a multi-speaker model.

I don't have a model to test with...

I can share one if needed

matatonic commented 1 month ago

speaker_idx=0 may work - and yes if I can, that would be helpful.

ther3zz commented 1 month ago

How should I contact you without making the link available to everyone?

matatonic commented 1 month ago

I'm on discord as matatonic in TheBloke AI, Text Generation WebUI and open-webui discords if that works, you can DM me there.

matatonic commented 1 month ago

Planned for next release.