Closed maximilianchen closed 4 years ago
I meet the same problem. Is your problem solved?
Hi @MeiGM, Maybe you should try theses instead: https://github.com/descriptinc/melgan-neurips https://github.com/kan-bayashi/ParallelWaveGAN
Thank you very much.
Hi @tuan3w, I am training the model using the LJSpeech data set. I preprocessed the data using preprocess.py with --samples_per_audio 20 (by the way, what does samples_per_audio stand for?).
Then the preprocessing returned a large set of .npy files along with files.txt. I noticed in the text file that the oder of the LJSpeech utterances are random. The following shows a fragment of the text file:
/home/notebooks/hdd/training_data//LJ031-0102_014.npz /home/notebooks/hdd/training_data//LJ040-0158_010.npz /home/notebooks/hdd/training_data//LJ008-0258_018.npz /home/notebooks/hdd/training_data//LJ020-0022_009.npz /home/notebooks/hdd/training_data//LJ038-0071_020.npz /home/notebooks/hdd/training_data//LJ011-0012_007.npz /home/notebooks/hdd/training_data//LJ041-0138_014.npz ...
When training iterated over 45000 steps, I listened to the generated_45000.wav, and it is a piece of noice. Is is due to the fact that 45000 steps are not enough to see?