vvvm23 / vqvae-2

PyTorch implementation of VQ-VAE-2 from "Generating Diverse High-Fidelity Images with VQ-VAE-2"
MIT License
142 stars 17 forks source link

How to reproduce this visualization from the VQVAE 2 paper #8

Closed theAdamColton closed 1 year ago

theAdamColton commented 1 year ago

The original paper had this cool graphic in it, which showed what I believe is a decoded representation of different parts of the network. But I don't understand how in practice you could obtain a decoded image using only the top level FFHQ encoder representation. In the case of the three level FFHQ model, the final decoder layer is applied to a concatenation of the upscaled middle layer and the double upscaled top layer, and expects 192 layers.

Is there a way, only using information from the top level encoded quantized representation, to get an image out of the network?

vqvae2

vvvm23 commented 1 year ago

This is something I also never quite understood about the original paper, nor have I explored myself. So I can't really answer this question. It could be something as naive as passing a zero tensor as a substitute for the lower level codes in the final decoder, but only the original authors know.

Thanks for bringing this to my attention, I am currently working on a refactor of this repo (see #5) so I might investigate this once that is done. There are actually quite a lot of unclear things in the paper that we may never know for sure how it was done for the paper.