jasonppy / VoiceCraft

Zero-Shot Speech Editing and Text-to-Speech in the Wild
Other
7.65k stars 748 forks source link

Working in WSL but 10min+ inference #153

Open holunzoo12 opened 3 months ago

holunzoo12 commented 3 months ago

Windows 11 64bit, WSL Ubuntu, RTX 3080 10gb vram

Hey, not sure anyone still checks this but I've cobbled the environment together after a lot of trial and error and have it working on gradio. The only problem is generating audio using either models takes at least 10 minutes. It's also using almost all of my vram. I see a person on reddit saying they have it running locally through jupyter notebook and it runs in near real time on their 3080. They just needed to change a line in "inference_tts.ipynb" to recognize their GPU. is this possible in gradio? what line would I need to edit inside what file?

ajkessel commented 2 months ago

@holunzoo12 can you provide any detail on how you got it working under WSL? I've had no luck so far.

holunzoo12 commented 2 months ago

@holunzoo12 can you provide any detail on how you got it working under WSL? I've had no luck so far.

Sorry, it's been a bit too long and It took hours of confused tinkering to put things together to have it run without shooting errors out. I can't remember exactly what I did to get it working. I've given up on it for now since I don't have the hardware to run it anyways it seems. Very sorry.

ajkessel commented 2 months ago

Yeah, I gave up on this but whisperspeech seems solid on WSL.