karpathy / deep-vector-quantization

VQVAEs, GumbelSoftmaxes and friends
MIT License
521 stars 43 forks source link

Related questions about the gumbel softmax #1

Closed CDitzel closed 3 years ago

CDitzel commented 3 years ago

Hey Andrej,

been following the discussion you had with Phil regarding the vqVAE code.

Do you happen to know what the whole Gumbel parameterization trick brings to the table that could not also be achieved with the original vqVAE? I find it a bit hard to understand intuitively what

https://github.com/karpathy/deep-vector-quantization/blob/bcc1c38bec754caf0398c4cd054f34a963d348b4/model.py#L93

i.e. the contraction over the feature dimension with the embedding rows does. And how does it relate to the traditional approach in which one determines the closest vector of the embeddings to every feature vector?

Also, I most of the times see Transposed convolutions in the Decoder part. would convolutions + upsampling or pixel shuffle methods improve the results? Or is it intentional that the encoder/decoder part in most vqVAE implementations remains relatively simple?

Thank you in advance and greetings from Germany!

karpathy commented 3 years ago

The einsum is basically just a matrix multiply but it is applied channelwise. It's basically a conv1x1 layer with bias=False, but I left it this way because it makes the embedding weights explicit in the module, which I kind of like, and this makes it also symmetric with what's happening on the vqvae side.

The architectures can certainly be improved! I used these because I was trying to reproduce DeepMind numbers to make sure everything is set up properly. When I get around also squeezing this for performance I expect we should be able to find alternatives.

karpathy commented 3 years ago

Hope that made sense, closing.