Benefits of using Gumbel-Softmax sampling over VQ-VAE-esq approach for codebook generation?

I'm trying to better understand the design decisions made for the "Categorical Codebook Matching for Embodied Character Controllers" paper.

What was the reasoning for using the Gumbel-Softmax sampling over commitment and codebook losses used in the VQ-VAE paper? Was it to increase the potential diversity in the sampled motions? Or is there more to it?

I would think that the VQ-VAE approach would lead to a more interpretable space and have a lot more deterministic behavior at inference time.

sebastianstarke / AI4Animation

Benefits of using Gumbel-Softmax sampling over VQ-VAE-esq approach for codebook generation? #135