Does RL training use sampled VAE latent?

ErcBunny commented 2 weeks ago

Hi,

Nice code and thanks for open-sourcing. I noticed the default value for inference_mode is False and return_sampled_latent is True and I was wondering if it is turned to deterministic somewhere for training. Or is it deliberately chosen to use sampled latent? Could you kindly elaborate on this?

Thank you.

mihirk284 commented 5 days ago

Hello @ErcBunny, Thank you :)

Currently, the RL training is not using the sampled VAE latent. The configuration you pointed out results in deterministic VAE outputs.

It is not necessary to keep it this way for training. I expect the RL policy to learn robustly even if the sampled latent is fed into it and not degrade in performance. You can set the inference_mode to False and try training as well.

Edit: In other words, both configurations will lead to stable training.

Let me know if this answered your question. Best,

ErcBunny commented 5 days ago

Thanks @mihirk284, good to know that whether using sampled latent or not both result in stable training. But could you also elaborate on how inference_mode=False with return_sampled_latent=True result in deterministic VAE outputs? If it is configured to inference_mode=True or return_sampled_latent=False somewhere else, could you also please point it out?

mihirk284 commented 5 days ago

To be able to answer this question, I would like to know if you are using a particular task definition? If this is the case, the simulation seed is pre-defined in the configuration file of either the learning algorithm or the task_config class for that task.

If you are seeing the same latent (for example in the first timestep of multiple consecutive runs of the simulation), can you try the following?

# run these two one after another for the same image_obs tensor
image_latents1 = self.vae_model.encode(image_obs).clone()
image_latents2 = self.vae_model.encode(image_obs).clone()

You can put this around this line in code: https://github.com/ntnu-arl/aerial_gym_simulator/blob/070391cc30d92b76dcd3e4e41a49c8d1b60080ae/aerial_gym/task/navigation_task/navigation_task.py#L271

Comparing the outputs. Ideally, these two should be different from one-another. Could you let me know if this happens or if they are both identical?

Edit: On further inspection of code, I realize my above comment might be incorrect: ~~Currently, the RL training is not using the sampled VAE latent. The configuration you pointed out results in deterministic VAE outputs.~~

ntnu-arl / aerial_gym_simulator

Does RL training use sampled VAE latent? #17