lucidrains / meshgpt-pytorch

Implementation of MeshGPT, SOTA Mesh generation using Attention, in Pytorch
MIT License
754 stars 61 forks source link

`ResidualLFQ` was successful, but `ResidualVQ` failed severely! #73

Open fighting-Zhang opened 8 months ago

fighting-Zhang commented 8 months ago

When training the MeshAutoencoder, I compared ResidualLFQ and ResidualVQ.
ResidualLFQ is your default option, which can rebuild a reasonable structure.

However, when I use ResidualVQ (without changing any of your default parameters), the reconstruction results in significant errors, the validation set loss gradually increases, and the reconstruction only yields a few faces(e.g. 3 faces). I'm not quite sure what the reason is.

` if use_residual_lfq: self.quantizer = ResidualLFQ( dim = dim_codebook, num_quantizers = num_quantizers, codebook_size = codebook_size, commitment_loss_weight = 1., rlfq_kwargs, rq_kwargs ) else: self.quantizer = ResidualVQ( dim = dim_codebook, num_quantizers = num_quantizers, codebook_size = codebook_size, shared_codebook = True, commitment_weight = 1., stochastic_sample_codes = rvq_stochastic_sample_codes,

sample_codebook_temp = 0.1, # temperature for stochastically sampling codes, 0 would be equivalent to non-stochastic

            **rvq_kwargs,
            **rq_kwargs
        )

` some defalt parameters:

use_residual_lfq = True, # whether to use the latest lookup-free quantization rq_kwargs: dict = dict( quantize_dropout = True, quantize_dropout_cutoff_index = 1, quantize_dropout_multiple_of = 1, ), rvq_kwargs: dict = dict( kmeans_init = True, threshold_ema_dead_code = 2, ), rlfq_kwargs: dict = dict( frac_per_sample_entropy = 1. ), rvq_stochastic_sample_codes = True,

loss curve: red curve is ResidualVQ, and grey curve is ResidualLFQ.

image
fighting-Zhang commented 8 months ago

LFQ seems not to support shared_codebook, resulting in the same index in mesh codes corresponding to different meanings. Could this affect the model's learning?

Additionally, the paper utilizes RVQ-VAE, indicating that RVQ should also possess a certain capability. However, in practical training, the performance of RVQ is very poor. Could this be an issue with the code?

lucidrains commented 8 months ago

@fighting-Zhang i think scalar quantization is the future. you aren't the only one reporting great results without loss of generalization

LFQ has a fixed codebook, so it doesn't matter whether it is shared or not