pairlab / QueST

Official code for "QueST: Self-Supervised Skill Abstractions for Continuous Control" [NeurIPS 2024]
MIT License
36 stars 2 forks source link

Experiment of observation conditioned decoder #6

Open ssam2s opened 1 week ago

ssam2s commented 1 week ago

Thank you for your great work!

I'm conducting various experiments to condition the decoder on observations.

In your ablation study for the observation-conditioned decoder, were all hyperparameters same with the released code? Also, how were the observation tokens constructed?

In some conditioning experiments, I've observed cases where the autoencoder's grad_norm increases. Could this indicate potential issues with training?

Congratulations on having your paper accepted at a top conference!

atharvamete commented 1 week ago

Thank you for your kind words!

were all hyperparameters same with the released code?

Yes, to ensure a fair comparison we used same hyperparameters as the ones reported in Appendix B1. In our current implementation we construct one token per observation timestep in stage 2, so for this ablation we just append that to the skill tokens from the encoder and let the decoder cross attend to all tokens (obs+skill tokens) combined.

I've observed cases where the autoencoder's grad_norm increases. Could this indicate potential issues with training?

Can you confirm if this is during stage 0,1 or 2 training? Also by increase do you mean it's blowing up to some very high value? If you want to train/finetune the autoencoder then increase in grad norm is expected. In current codebase, stage 0 is autoencoder only training, stage 1 is prior training and stage 2 is finetuning the prior along with autoencoder depending on whether you have l1_loss_scale set to non-zero value. You can also explicitly freeze the autoencoder params if you don't want to train it.

ssam2s commented 1 week ago

Thank you for your answering !

Can you confirm if this is during stage 0,1 or 2 training?

For various experiments, I conditioned the decoder on observations during the stage 0 training process, and I ended up with results that contradicted those obtained using the provided code. Additionally, when I used this pretrained autoencoder in stage 1, the success rate was nearly zero. There could be multiple reasons behind this, but I wanted to ask if you might have any insights into possible causes.
