It's not necessary to sum up 'log_probs' at the last dimension in discretized_mix_logistic_loss function

naturomics commented 6 years ago

Hi Ryuichi,

Thanks for your great work.

Forgive me if I got this wrong but I think it's not necessary to sum up log_probs at the last dimension here. Saying

log_probs = torch.sum(log_probs, dim=-1, keepdim=True) + F.log_softmax(logit_probs, -1)

should be

log_probs = log_probs + F.log_softmax(logit_probs, -1)

I saw your code is adapted from pixel-cnn, but these two cases are different. Since pixel has 3 channels(RGB) in pixelCNN and the log_probs has shape [B, H, W, C, num_mix] before this line, the mixture model is built on pixel not channel so they get the probability of each pixel with tf.reduce_sum(log_probs,axis=3). But here for the wavenet, waveform has only one channel, saying log_probs has shape [B, T, num_mix], and the mixture model is built on that channel, so it's not needed to sum up at any dimension. In addition, your code summing up at the num_mix dimension is also inconsistent with pixel cnn(which at channels dimension).

Feel free to ignore me if I'm wrong, Thanks

jiqizaisikao commented 6 years ago

yes,i have also found the same problem.

jiqizaisikao commented 6 years ago

I have one question:the integration of p from (-1,1) quantize_channels=65536 ，should be one It's fine to near one ,but it seems that there is no guarantee for this。Considering the original code the p is so small amount to exp(-50) and integration of p will not near one。 I have trained 320k iters using my modified version the p is amount to exp(-5)

r9y9 commented 6 years ago

Thank you for the comment. I would like to think about it again but do not have time currently:( If you are sure it's a right fix and can get good speech quality, that would be great.

From https://github.com/r9y9/wavenet_vocoder/issues/1#issuecomment-359735902:

Honestly, I am not completely sure if I implemented it correctly, so I regard the feature as experimental at the moment. Let me know if you find a bug. PRs are always welcome!

r9y9 commented 6 years ago

@naturomics Sorry for super late reply. I think you are right. Thank you for pointing this out! Much appreciated.

r9y9 / wavenet_vocoder

It's not necessary to sum up 'log_probs' at the last dimension in discretized_mix_logistic_loss function #33