VQ-VAE-encoder + WaveNet decoder usage

swasun / VQ-VAE-Speech

PyTorch implementation of VQ-VAE + WaveNet by [Chorowski et al., 2019] and VQ-VAE on speech signals by [van den Oord et al., 2017]

MIT License

264 stars 53 forks source link

VQ-VAE-encoder + WaveNet decoder usage #5

Closed shiva1393 closed 5 years ago

shiva1393 commented 5 years ago

In readme i didn't find any steps for VQ-VAE-encoder + WaveNet decoder .Please can any one help me to proceed further.

roberthoenig commented 5 years ago

From what I can tell, the VQ-VAE-encoder + WaveNet decoder setup is work in progress. Looking at the code, the WaveNet decoder itself seems to be readily implemented, but not yet hooked into a PipelineFactory for training it. The last activity in this repository was two months ago, so it might be unmaintained. Perhaps @swasun can shed some light?

swasun commented 5 years ago

I had to move on other projects and this one was stopped prematurely. Unfortunately I don't have the time to update it for now, though it's almost done. I will maybe give it a shot later.

roberthoenig commented 5 years ago

@swasun I need a working implementation of this model with a WaveNet decoder, so I'd be happy to give the remaining implementation a shot. Is there anything else to be done, other than updating the pipeline factory? Also, has the WaveNet decoder been tested yet?

swasun commented 5 years ago

@roberthoenig The WaveNet decoder code comes from https://github.com/r9y9/wavenet_vocoder which seems to be a good implementation. I didn't test it on my own repository, because I worked on another private repository where the wavenet was already implemented, and I let this one open in case someone needs it.

Besides that point, I think it could be worth it to add a real time evaluation and use Tensorboard on top of it (for now the evaluation is done after the training, which could be annoying as the wavenet training may be longer).
A vectorized implementation of the Jitter layer may be better.
In the evaluation part: I didn't test well _many_to_one_mapping() and I didn't finish _compute_speaker_dependency_stats().
also, there is no way to sample speech from the trained model, which could be useful for most of the people interested in this model.

shiva1393 commented 5 years ago

Hai @swasun, i combined both wavenet + vqvae (wavenet r9v9/wavenet_vocoder)..But vqloss going towards zero in fewsteps and activating less indices (from embedding table for k=128 it activating around 30 indices only)... So wavenet loss not decreasing.... to overcome this problem any suggestions can you give.... Here iam showing loss functions ... e_latent_loss = torch.mean((quantized.detach() - inputs)2) q_latent_loss = torch.mean((quantized - inputs.detach())2) commitment_loss = self._commitment_cost * e_latent_loss vq_loss = q_latent_loss + commitment_loss wavenet_loss= criterion(y_hat[:, :, :-1], y[:, 1:, :], mask=mask) loss2=vq_loss +wavenet_loss loss2.backward().......

swasun commented 5 years ago

@shiva1393, @HenryZhou7 Hello guys. Sorry no time to check on that yet, I have too much to-do with my new position. Here's a repo where someone is actively working on that: https://github.com/hrbigelow/ae-wavenet