Open berkeleymalagon opened 2 years ago
For context, here are the params during inference in case there's anything obviously wrong with them:
model.params: {'batch_size': 16, 'learning_rate': 0.0002, 'max_grad_norm': None, 'sample_rate': 44100, 'n_mels': 80, 'n_fft': 1024, 'hop_samples': 256, 'crop_mel_frames': 62, 'residual_layers': 30, 'residual_channels': 64, 'dilation_cycle_length': 10, 'unconditional': True, 'noise_schedule': [0.0001, 0.0011183673469387756, 0.002136734693877551, 0.0031551020408163264, 0.004173469387755102, 0.005191836734693878, 0.006210204081632653, 0.007228571428571429, 0.008246938775510203, 0.009265306122448979, 0.010283673469387754, 0.01130204081632653, 0.012320408163265305, 0.013338775510204081, 0.014357142857142857, 0.015375510204081632, 0.016393877551020408, 0.017412244897959183, 0.01843061224489796, 0.019448979591836734, 0.02046734693877551, 0.021485714285714285, 0.02250408163265306, 0.023522448979591836, 0.02454081632653061, 0.025559183673469387, 0.026577551020408163, 0.027595918367346938, 0.028614285714285714, 0.02963265306122449, 0.030651020408163265, 0.031669387755102044, 0.03268775510204082, 0.033706122448979595, 0.03472448979591837, 0.035742857142857146, 0.03676122448979592, 0.0377795918367347, 0.03879795918367347, 0.03981632653061225, 0.04083469387755102, 0.0418530612244898, 0.042871428571428574, 0.04388979591836735, 0.044908163265306125, 0.0459265306122449, 0.046944897959183676, 0.04796326530612245, 0.04898163265306123, 0.05], 'inference_noise_schedule': [0.0001, 0.001, 0.01, 0.05, 0.2, 0.5], 'audio_len': 22051}
I tried using this codebase in the past for SC09 unconditional generation and found that it does not work. An alternative implementation of DiffWave at philsyn/diffwave-unconditional did work. I've released an improved implementation of this at https://github.com/albertfgu/diffwave-sashimi
@Andrechang Hi, using this repo, I have generated silence waves in SC09 datasets, have you succeeded in getting plausible sounds?
It shouldn't output silence waves. When I trained shortly it generated noisy audio.
It seems that the Diffwave paper uses res_channel = 256
for unconditional speech synthesis (but we have 64 in this code), which is why we could not get reasonable sounds.
I"m running the this command to generate unconditional samples.
python -m diffwave.inference --fast /path/to/model -o output.wav
I've trained for almost 4k epochs on 7k+ sounds. I seem to get the same sound (or a very similar one) regardless of training time.
I have not worked with diffwave before - any tips for debugging this?
Thanks