vincentherrmann / pytorch-wavenet

An implementation of WaveNet with fast generation
MIT License
968 stars 225 forks source link

Why not output samples from student to teacher? #12

Open neverjoe opened 6 years ago

neverjoe commented 6 years ago

https://github.com/vincentherrmann/pytorch-wavenet/blob/091396443bf656b348d9f9c17696d5aedc252eb9/wavenet_training.py#L259

neverjoe commented 6 years ago

i think target_distribution should be compute by samples from student as output samples from student to teacher.

vincentherrmann commented 6 years ago

I don't really get what you mean. Do you think we should sample many inputs for the teacher network from the mu and s output of the student network? Then we had to calculate the whole teacher network multiple times which would be very computationally expensive. Also, if we sample the student output we lose the conditioning on the previous time-samples, so I don't think it makes sense. The output of mu and s of the student network exists only to compare the distributions of the student and the teacher network.

neverjoe commented 6 years ago

I got your idea, i have same worry, but paper said we need to estimate the distributions of teacher and student by sampling. By the way, the target_distribution and student_samples has different shape, is a bug ? Have u got any reasonable results?

vincentherrmann commented 6 years ago

In the paper it says that x = g(z), where z is the input noise. It think the whole point of equations (9)-(13) in the paper is to save us from having to calculate the teacher network multiple times. The target_distribution is a parameterization of the teacher distribution, and student_samples are multiple samples from the student distribution, so they should have different shapes. For me it seems to work reasonably well, although the output of the parallel wavenet is noisier than the original one (but I haven't implemented the additional loss terms yet, so that might help).

neverjoe commented 6 years ago

Great! I think the power loss and contrastiveis loss is very important for good quality voice.

neverjoe commented 6 years ago

Can u show me your loss plot?My loss can't get coveraged for days.