How to reproduce this visualization from the VQVAE 2 paper

vvvm23 / vqvae-2

PyTorch implementation of VQ-VAE-2 from "Generating Diverse High-Fidelity Images with VQ-VAE-2"

MIT License

142 stars 17 forks source link

The original paper had this cool graphic in it, which showed what I believe is a decoded representation of different parts of the network. But I don't understand how in practice you could obtain a decoded image using only the top level FFHQ encoder representation. In the case of the three level FFHQ model, the final decoder layer is applied to a concatenation of the upscaled middle layer and the double upscaled top layer, and expects 192 layers.

Is there a way, only using information from the top level encoded quantized representation, to get an image out of the network?

vqvae2

vvvm23 / vqvae-2

How to reproduce this visualization from the VQVAE 2 paper #8