semperai / amica

Amica is an open source interface for interactive communication with 3D characters with voice synthesis and speech recognition.
https://heyamica.com
MIT License
591 stars 92 forks source link

How is possible to add new speechT5 models #41

Open virtualrobotix opened 7 months ago

virtualrobotix commented 7 months ago

How is possible change the text to speech model ? Is possible to use other .bin like voxpopuli for Italian language or other trained by ourself ? I try to add the voxpopuli.bin file in the public directory but the app stop to work without info in debug . The main difference is that original model are 2 kbyte the voxpopuli are 500 mbyte .

kasumi-1 commented 7 months ago

Hi!

speecht5 requires x-vector embeddings - there is a list of ones from cmu arctic here https://huggingface.co/datasets/Xenova/cmu-arctic-xvectors-extracted/tree/main

I haven't generated these before, but I think you can use https://huggingface.co/pyannote/embedding to create them.

kustomzone commented 7 months ago

@virtualrobotix Haven't looked into it yet, but generate_paths.js looks to be misconfigured. (at least on windows) You can bypass it by editing the src\paths.ts and adding your model.bin path like so;

speechT5SpeakerEmbeddingsList = ['speecht5_speaker_embeddings/speecht5_tts/pytorch_model.bin'];

Use quotes = string. You'll actually see it loading on the right-hand side of the screen the first time it runs. Likely it's an array so you can comma separate different voice models, and they'll each show up in the UI. Just started testing and it's slow on my pc using cpu, but it works!

loading