keonlee9420 / DiffGAN-TTS

PyTorch Implementation of DiffGAN-TTS: High-Fidelity and Efficient Text-to-Speech with Denoising Diffusion GANs
MIT License
319 stars 44 forks source link

Python and dependency versions #22

Open hatlessman opened 1 year ago

hatlessman commented 1 year ago

Hello! Could you update the docs with info about using python==3.8 making things easier (praat-parselmouth only distributes binaries for python 3.8 and I couldn't get it to compile), numba==0.49 (numba.decorators was pulled in 0.50) , resampy==0.3.1 (to match numba==0.49), and add python_speech_features, pandas, and tensorflow to requirements.txt?

nghiap commented 1 year ago

OK this is what I got working for my Conda Enviroment: just cut and paste it into the file: requirements.txt

g2p-en==2.1.0 inflect==4.1.0 librosa==0.7.2 matplotlib==3.4.2 numba==0.48.0 numpy==1.22.4 pypinyin==0.39.0 pyworld==0.3.2 pycwt==0.3.0a22 praat-parselmouth==0.3.3 PyYAML==5.4.1 scikit-learn==0.24.1 scipy==1.6.3 soundfile==0.10.3.post1 tensorboard==2.13.0 tgt==1.4.4 torch==1.8.1 tqdm==4.46.1 unidecode==1.1.1 pandas==2.0.3 pandas==2.0.3 python-speech-features==0.6 tensorflow==2.13.0 resampy==0.3.1

nghiap commented 1 year ago

Python version that works for me is 3.8 you will also need to unzip the tar files and copy the model file to correct path, which will be prompted when fail as to where to put them right. This is the inference command line that works for me:

python synthesize.py --restore_step 2 --model naive --mode single --text "I am so excited that I got this TTS working" --dataset LJSpeech

I would have like for it to read a text file and create a wav file instead of reading a sentence like this. Does anyone know how to make this change?