Closed ther3zz closed 1 month ago
I've thought to add it but I haven't looked into it yet - do you have some references for me so I can quick start? How is a finetuned xtts model normally used?
I've thought to add it but I haven't looked into it yet - do you have some references for me so I can quick start? How is a finetuned xtts model normally used?
You'll have to bear with me, I have no idea what I'm doing.
I'm currently running it like: tts-server --model_path /models/model.pth --config_path /models/config.json --use_cuda true --speakers_file_path /models/sample.wav
there's some sample inference code here: https://docs.coqui.ai/en/latest/models/xtts.html#advanced-training
I actually had to modify xtts.py , server.py and the config module to even get xtts-server to use the fine tuned model.
The changes are also relevant to the marytts api since my use case is home assistant
Do you still use the reference_wav with a custom model?
Do you still use the reference_wav with a custom model?
LOL, you know what, probably not needed with a fine tuned model and I didnt even think about it until you mentioned it now. I'm going to test without passing the reference wav to see what happens...
btw, this looks like a simpler change (model_path, config_path) instead of (model_name)
I don't have a model to test with...
Do you still use the reference_wav with a custom model?
LOL, you know what, probably not needed with a fine tuned model and I didnt even think about it until you mentioned it now. I'm going to test without passing the reference wav to see what happens...
So I got an error when I didnt pass either speaker_idx or speaker_wav so it looks like one of those are required.
ValueError: [!] Looks like you are using a multi-speaker model. You need to define either a `speaker_idx` or a `speaker_wav` to use a multi-speaker model.
I don't have a model to test with...
I can share one if needed
speaker_idx=0 may work - and yes if I can, that would be helpful.
How should I contact you without making the link available to everyone?
I'm on discord as matatonic in TheBloke AI, Text Generation WebUI and open-webui discords if that works, you can DM me there.
Planned for next release.
I see that this supports voice cloning but I haven't seen anywhere in the docs that this can be used with a fine tuned xtts v2 model.
Is that currently possible?