Closed naturomics closed 6 years ago
yes,i have also found the same problem.
I have one question:the integration of p from (-1,1) quantize_channels=65536 ,should be one It's fine to near one ,but it seems that there is no guarantee for this。Considering the original code the p is so small amount to exp(-50) and integration of p will not near one。 I have trained 320k iters using my modified version the p is amount to exp(-5)
Thank you for the comment. I would like to think about it again but do not have time currently:( If you are sure it's a right fix and can get good speech quality, that would be great.
From https://github.com/r9y9/wavenet_vocoder/issues/1#issuecomment-359735902:
Honestly, I am not completely sure if I implemented it correctly, so I regard the feature as experimental at the moment. Let me know if you find a bug. PRs are always welcome!
@naturomics Sorry for super late reply. I think you are right. Thank you for pointing this out! Much appreciated.
Hi Ryuichi,
Thanks for your great work.
Forgive me if I got this wrong but I think it's not necessary to sum up
log_probs
at the last dimension here. Sayingshould be
I saw your code is adapted from pixel-cnn, but these two cases are different. Since pixel has 3 channels(RGB) in pixelCNN and the
log_probs
has shape [B, H, W, C, num_mix] before this line, the mixture model is built on pixel not channel so they get the probability of each pixel withtf.reduce_sum(log_probs,axis=3)
. But here for the wavenet, waveform has only one channel, sayinglog_probs
has shape [B, T, num_mix], and the mixture model is built on that channel, so it's not needed to sum up at any dimension. In addition, your code summing up at thenum_mix
dimension is also inconsistent with pixel cnn(which at channels dimension).Feel free to ignore me if I'm wrong, Thanks