Commit loss is negative

ZekaiGalaxy commented 8 months ago

When I trained on several objects with several epochs, the commit loss starts to become negative, and it turns out that the overall loss keeps going down, but neither the recon loss nor the reconstruction result turns better.

I wonder if the commit loss being negative is normal or not, or what it implies

MarcusLoppe commented 8 months ago

Try lowering the diversity_gamma from 1.0 to 0.1 - 0.3. The quantizer uses this variable to lower the loss, so it uses 1.0 which allows it to trick itself to be more diverse since it punishes exploration less. As you can see,, your commit loss is near -1 which is due to the code below, I think it's good to have high diversity at start but at the end it might do more harm then good.

Part of the commit loss calculation in LFQ: entropy_aux_loss = per_sample_entropy - self.diversity_gamma * codebook_entropy

autoencoder = MeshAutoencoder(    
    num_discrete_coors = 128 ,
    rlfq_kwargs = {"diversity_gamma": 0.2 }
)

qixuema commented 8 months ago

Try lowering the diversity_gamma from 1.0 to 0.1 - 0.3. The quantizer uses this variable to lower the loss, so it uses 1.0 which allows it to trick itself to be more diverse since it punishes exploration less. As you can see,, your commit loss is near -1 which is due to the code below, I think it's good to have high diversity at start but at the end it might do more harm then good.

Part of the commit loss calculation in LFQ: entropy_aux_loss = per_sample_entropy - self.diversity_gamma * codebook_entropy
autoencoder = MeshAutoencoder(    
    num_discrete_coors = 128 ,
    rlfq_kwargs = {"diversity_gamma": 0.2 }
)
Hi @MarcusLoppe

Happy New Year! 🎉🎉🎉

I attempted to train an autoencoder using 20 different chairs as training samples and encountered the same issue where the commit loss was negative.

This is the commit loss curve during my training process.

W B Chart 2023_12_28 23_14_01

I will reduce the diversity_gamma from 1.0 to between 0.1 and 0.3 to see what changes occur in the commit loss.

Best regards, Xueqi Ma

MarcusLoppe commented 8 months ago

Try lowering the diversity_gamma from 1.0 to 0.1 - 0.3. The quantizer uses this variable to lower the loss, so it uses 1.0 which allows it to trick itself to be more diverse since it punishes exploration less. As you can see,, your commit loss is near -1 which is due to the code below, I think it's good to have high diversity at start but at the end it might do more harm then good. Part of the commit loss calculation in LFQ: entropy_aux_loss = per_sample_entropy - self.diversity_gamma * codebook_entropy
autoencoder = MeshAutoencoder(    
    num_discrete_coors = 128 ,
    rlfq_kwargs = {"diversity_gamma": 0.2 }
)
Hi @MarcusLoppe

Happy New Year! 🎉🎉🎉

I attempted to train an autoencoder using 20 different chairs as training samples and encountered the same issue where the commit loss was negative.

This is the commit loss curve during my training process.

I will reduce the diversity_gamma from 1.0 to between 0.1 and 0.3 to see what changes occur in the commit loss.

Best regards, Xueqi Ma

Experiment a little bit since I just discovered this yesterday and haven't tested it out fully :) Diversity is good at the start of the training but changing too much at the end of the training isn't very good.

Please let me know what you find out.

qixuema commented 8 months ago

@MarcusLoppe

I tried reducing the diversity_gamma from 1.0 to 0.2 and retrained the data. The current commit loss curve is shown in the following image.

W B Chart 2023_12_29 09_05_36

ZekaiGalaxy commented 8 months ago

Thank you @MarcusLoppe, From my perspective, maybe we can use a 'decaying' gamma, since we want it to explore at the beginning but converge at the end.

I also notice that in @qixuema 's experiment, with gamma = 0.2 commit loss do drop, but there are some extreme commit loss values. Does that mean the model overfits to a certain or several types of shapes, codes, whatever and can't do rare cases well.

ZekaiGalaxy commented 8 months ago

Also @qixuema how's your recon loss going? I find that though my recon loss is going down (~0.32), I still can't reconstruct the train data using autoencoder when trained on multi objects.

qixuema commented 8 months ago

Hi, @ZekaiGalaxy

The following are my recon_loss and total_loss.

W B Chart 2023_12_29 14_51_00

W B Chart 2023_12_29 14_50_48

MarcusLoppe commented 7 months ago

@lucidrains

Hi all, this issue resolves itself when training on a large dataset, using a 300x50 augmentations the commit was at 3-14 at start and then settled itself and matched the recon loss at around 0.6

lucidrains / meshgpt-pytorch

Commit loss is negative #43