tuan3w / cnn_vocoder

A fast cnn-based vocoder
MIT License
78 stars 13 forks source link

Synthesized audio being a piece of noice #7

Closed maximilianchen closed 4 years ago

maximilianchen commented 5 years ago

Hi @tuan3w, I am training the model using the LJSpeech data set. I preprocessed the data using preprocess.py with --samples_per_audio 20 (by the way, what does samples_per_audio stand for?).

Then the preprocessing returned a large set of .npy files along with files.txt. I noticed in the text file that the oder of the LJSpeech utterances are random. The following shows a fragment of the text file:

/home/notebooks/hdd/training_data//LJ031-0102_014.npz /home/notebooks/hdd/training_data//LJ040-0158_010.npz /home/notebooks/hdd/training_data//LJ008-0258_018.npz /home/notebooks/hdd/training_data//LJ020-0022_009.npz /home/notebooks/hdd/training_data//LJ038-0071_020.npz /home/notebooks/hdd/training_data//LJ011-0012_007.npz /home/notebooks/hdd/training_data//LJ041-0138_014.npz ...

When training iterated over 45000 steps, I listened to the generated_45000.wav, and it is a piece of noice. Is is due to the fact that 45000 steps are not enough to see?

MeiGM commented 4 years ago

I meet the same problem. Is your problem solved?

tuan3w commented 4 years ago

Hi @MeiGM, Maybe you should try theses instead: https://github.com/descriptinc/melgan-neurips https://github.com/kan-bayashi/ParallelWaveGAN

MeiGM commented 4 years ago

Thank you very much.