jasonppy / VoiceCraft

Zero-Shot Speech Editing and Text-to-Speech in the Wild
Other
7.63k stars 746 forks source link

In Depth Install Guide? #77

Open Deadstarrr opened 7 months ago

Deadstarrr commented 7 months ago

I'm trying to get this working using anaconda environments. I followed the installation instructions, but when I type "export CUDA_VISIBLE_DEVICES=0" in the command prompt after activating my voicecraft environment it says "'export' is not recognized as an internal or external command, operable program or batch file."

Anyone know of a youtube video that goes through all the installation steps as well as running and using the program?

lukaszliniewicz commented 7 months ago

Perhaps the installation instructions in my API fork will be useful: https://github.com/lukaszliniewicz/VoiceCraft_API. I'm not setting the cuda variable at all and it works fine. There is also a script that will perform the installation for you, and you can open the API Webui and test the model from there.

Deadstarrr commented 7 months ago

Perhaps the installation instructions in my API fork will be useful: https://github.com/lukaszliniewicz/VoiceCraft_API. I'm not setting the cuda variable at all and it works fine. There is also a script that will perform the installation for you, and you can open the API Webui and test the model from there.

I'll check this out. How are you finding the quality of voicecraft's results? Is this the best local voice cloning solution so far?

lukaszliniewicz commented 7 months ago

You can check a longer demo I made and compare it to XTTS using the same voice sample here: https://github.com/lukaszliniewicz/Pandrator#Samples. The quality is undoubtedly better, and sometimes truly astounding. Still, it takes more time than processing XTTS generations with RVC (a speech-to-speech method), and then it becomes much less clear which result is better. RVC is not zero-shot, though, and you need to train a model, but if you want to generate a few audiobooks it may be worth the extra effort for the increase in speed (training is really simple). Tortoise is generally better than XTTS if used with diffusion rather than hifigan, but it's also much slower. I haven't played with other models much (YourTTS, VALL-E, commercial offerings like ElevenLabs etc.).