qianlin04 / Safe-offline-RL-with-diffusion-model

13 stars 1 forks source link

Clarification on Hyperparameters #1

Open greg3566 opened 3 weeks ago

greg3566 commented 3 weeks ago

Hi,

Thanks for the wonderful work! I have a question regarding the hyperparameters in the paper. Are the default hyperparameters stored in config.locomotion the same as those used in Figures 2, 5, and 6 of the paper?

Thanks!

qianlin04 commented 3 weeks ago

Thanks for asking. All default hyperparameters are the same as those in config.locomotion, except for some mentioned in Appendix D of Safe Offline Reinforcement Learning with Real-Time Budget. Overall, the most important hyperparameter is cost_grad_weight (set to 100 for MuJoCo environments). We can also tune it to different scales, such as {100, 1000, 5000}, for better performance.

greg3566 commented 3 weeks ago

Hi,

Thank you for your response! I have a few additional questions regarding the settings in the halfcheetah environment.

It seems that there are distinct configurations. such as the network architecture, in this environment. Specifically, in the paper, although a horizon of 32 was used instead of 4, did you still choose (1, 4, 8) for dim_mults? Additionally, as the halfcheetah environment in config.locomotion, did you apply attention to the diffusion network and use t_stopgrad=4?

I appreciate your clarification!

qianlin04 commented 2 weeks ago

Most hyperparameters are inherited from Janner's Diffuser. Regrettably, we didn't focus much on fine-tuning, such as network hyperparameters, as you mentioned, because they are time-consuming and less relevant to the safe RL considerations. I'm happy to discuss further if you have any ideas or empirical results to share.