ritheshkumar95 / pytorch-vqvae

Vector Quantized VAEs - PyTorch Implementation
824 stars 135 forks source link

Example for raw audio #21

Open emonigma opened 9 months ago

emonigma commented 9 months ago

Hello, and thanks for the code! I want to replicate the audio results from the paper, but the DeepMind repo does not have a VQ-VAE example for audio (see https://github.com/google-deepmind/sonnet/issues/141 ), and it seems quite different from the one for CIFAR:

We train a VQ-VAE where the encoder has 6 strided convolutions with stride 2 and window-size 4. This yields a latent space 64x smaller than the original waveform. The latents consist of one feature map and the discrete space is 512-dimensional.

Could you please include an example of using your code for audio?