jasonppy / VoiceCraft

Zero-Shot Speech Editing and Text-to-Speech in the Wild
Other
7.65k stars 746 forks source link

Installation on windows native #59

Open batchfileframework opened 7 months ago

batchfileframework commented 7 months ago

Hi,

This issue is a installation solution for installing on windows

preferably, if possible at all

without using WSL / docker / conda

Just stock python & pip, maybe a venv, maybe some powershell but preferably pure batch install.

In reference to previous attempts

https://github.com/jasonppy/VoiceCraft/issues/28 https://github.com/jasonppy/VoiceCraft/issues/29

Sewlell commented 7 months ago

The creator of #29 here

In my vision it is entirely doable for developers / contributors to make a webui-user.bat or start.bat like the classic A1111 Stable Diffusion. Especially when most dependencies can be easily downloaded in Windows without having compatibility issue with the OS or Python or anything else. ( except espeak-ng which I mentioned in #29 that it can't be download through command prompt )

yumlevi commented 7 months ago

it should be very doable, just need to replace espeak with another timestamp aligner

lukaszliniewicz commented 7 months ago

I made an API version that is compatible with Windows (currently only for TTS, not speech modification). See https://github.com/lukaszliniewicz/VoiceCraft_API. If you test it, please let me know if everything works. It is not exactly what you're looking for, and it uses conda (I think it's a very good method, but everyone has their preference). Still, you can use the modified audiocraft files, the USER and espeak solution from api.py and run inferences with Python in a venv. It comes with espeak. I will make an automatic installer for it or at least include it with my audiobook app (https://github.com/lukaszliniewicz/Pandrator).

@yumlevi Espeak is not a problem. You can install it using the official Windows installer and take the contents of its folder in ProgramFiles, create an espeak directory in the main directory of the repo, paste them and do this (or use my fork, which already has the espeak folder and the files):

# Get the current username
username = getpass.getuser()

# Set the USER environment variable to the username
os.environ['USER'] = username

# Set the os variable for espeak
os.environ['PHONEMIZER_ESPEAK_LIBRARY'] = './espeak/libespeak-ng.dll'
lukaszliniewicz commented 7 months ago

I added VoiceCraft to my audiobook/dubbing generator app: https://github.com/lukaszliniewicz/Pandrator. It has a one-click Windows installer and installs the API (https://github.com/lukaszliniewicz/VoiceCraft_API).

Lexcess commented 7 months ago

The creator of #29 here

In my vision it is entirely doable for developers / contributors to make a webui-user.bat or start.bat like the classic A1111 Stable Diffusion. Especially when most dependencies can be easily downloaded in Windows without having compatibility issue with the OS or Python or anything else. ( except espeak-ng which I mentioned in #29 that it can't be download through command prompt )

Vall-E-EX did a great job of a cross platform Gradio frontend for TTS that just works. Lots of cool features beyond basic TTS, such as audio from Microphone, paste in transcripts, manage voice presets and so on. Might be a good inspiration or even adaptable with attribution.