lmnt-com / diffwave

DiffWave is a fast, high-quality neural vocoder and waveform synthesizer.
Apache License 2.0
778 stars 113 forks source link

Unconditional synthesis #34

Open berkeleymalagon opened 2 years ago

berkeleymalagon commented 2 years ago

I"m running the this command to generate unconditional samples.

python -m diffwave.inference --fast /path/to/model -o output.wav

I've trained for almost 4k epochs on 7k+ sounds. I seem to get the same sound (or a very similar one) regardless of training time.

I have not worked with diffwave before - any tips for debugging this?

Thanks

berkeleymalagon commented 2 years ago

For context, here are the params during inference in case there's anything obviously wrong with them:

model.params: {'batch_size': 16, 'learning_rate': 0.0002, 'max_grad_norm': None, 'sample_rate': 44100, 'n_mels': 80, 'n_fft': 1024, 'hop_samples': 256, 'crop_mel_frames': 62, 'residual_layers': 30, 'residual_channels': 64, 'dilation_cycle_length': 10, 'unconditional': True, 'noise_schedule': [0.0001, 0.0011183673469387756, 0.002136734693877551, 0.0031551020408163264, 0.004173469387755102, 0.005191836734693878, 0.006210204081632653, 0.007228571428571429, 0.008246938775510203, 0.009265306122448979, 0.010283673469387754, 0.01130204081632653, 0.012320408163265305, 0.013338775510204081, 0.014357142857142857, 0.015375510204081632, 0.016393877551020408, 0.017412244897959183, 0.01843061224489796, 0.019448979591836734, 0.02046734693877551, 0.021485714285714285, 0.02250408163265306, 0.023522448979591836, 0.02454081632653061, 0.025559183673469387, 0.026577551020408163, 0.027595918367346938, 0.028614285714285714, 0.02963265306122449, 0.030651020408163265, 0.031669387755102044, 0.03268775510204082, 0.033706122448979595, 0.03472448979591837, 0.035742857142857146, 0.03676122448979592, 0.0377795918367347, 0.03879795918367347, 0.03981632653061225, 0.04083469387755102, 0.0418530612244898, 0.042871428571428574, 0.04388979591836735, 0.044908163265306125, 0.0459265306122449, 0.046944897959183676, 0.04796326530612245, 0.04898163265306123, 0.05], 'inference_noise_schedule': [0.0001, 0.001, 0.01, 0.05, 0.2, 0.5], 'audio_len': 22051}
albertfgu commented 2 years ago

I tried using this codebase in the past for SC09 unconditional generation and found that it does not work. An alternative implementation of DiffWave at philsyn/diffwave-unconditional did work. I've released an improved implementation of this at https://github.com/albertfgu/diffwave-sashimi

Rongjiehuang commented 2 years ago

@Andrechang Hi, using this repo, I have generated silence waves in SC09 datasets, have you succeeded in getting plausible sounds?

Andrechang commented 2 years ago

It shouldn't output silence waves. When I trained shortly it generated noisy audio.

Rongjiehuang commented 2 years ago

It seems that the Diffwave paper uses res_channel = 256 for unconditional speech synthesis (but we have 64 in this code), which is why we could not get reasonable sounds.