keonlee9420 / DiffGAN-TTS

PyTorch Implementation of DiffGAN-TTS: High-Fidelity and Efficient Text-to-Speech with Denoising Diffusion GANs
MIT License
320 stars 44 forks source link

VCTK generation fails #10

Closed KwekuYamoah closed 2 years ago

KwekuYamoah commented 2 years ago

Hello, thank you very much for your brilliant open-source project. I have been able to do single and batch generations using the LJSpeech dataset. However, when I try to replicate the results for the VCTK dataset, it fails.

I run the following command, !python3 synthesize.py --text "Hello World" --model naive --restore_step 300000 --mode single --dataset VCTK

I obtain the following output:

[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Unzipping taggers/averaged_perceptron_tagger.zip.
[nltk_data] Downloading package cmudict to /root/nltk_data...
[nltk_data]   Unzipping corpora/cmudict.zip.

==================================== Inference Configuration ====================================
 ---> Type of Modeling: naive
 ---> Total Batch Size: 32
 ---> Path of ckpt: ./output/ckpt/VCTK_naive
 ---> Path of log: ./output/log/VCTK_naive
 ---> Path of result: ./output/result/VCTK_naive
================================================================================================
Removing weight norm...
Traceback (most recent call last):
  File "synthesize.py", line 264, in <module>
    )) if load_spker_embed else None
  File "/usr/local/lib/python3.7/dist-packages/numpy/lib/npyio.py", line 416, in load
    fid = stack.enter_context(open(os_fspath(file), "rb"))
FileNotFoundError: [Errno 2] No such file or directory: './preprocessed_data/VCTK/spker_embed/p225-spker_embed.npy' 

I tried to investigate further and discovered that the specific speaker embedding folder and file did not exist in my directory. Any pointer to how I can solve the issue will be appreciated.

keonlee9420 commented 2 years ago

Hi @KwekuYamoah , thanks for your attention. You need to preprocess VCTK first following README.md to get speaker embeddings. But I also shared pre-extracted speaker embeddings at here for the user who want to generate speech without such burden, so please enjoy my projects with them!

KwekuYamoah commented 2 years ago

Thank you very much for your response. It does solve my problem. Kudos

yyh565655555 commented 1 year ago

Thank you very much for your response. It does solve my problem. Kudos

sorry to bother you, i met some problems in VCTK,shall i get the preprocessed_data of VCTK and i can check, please, thaks very much , 1215544940@qq.com