liuzuxin / cvpo-safe-rl

Code for "Constrained Variational Policy Optimization for Safe Reinforcement Learning" (ICML 2022)
GNU General Public License v3.0
63 stars 7 forks source link

About the hyperparameter kl_var_constraint. #7

Closed Zarzard closed 1 year ago

Zarzard commented 1 year ago

Hi zuxin, note that the hyperparameter kl_var_constraint, which corresponds to $\epsilon_{\Sigma}$ in the appendix of the paper, is set to 0.001 in config_cvpo.yaml, but in the paper it is set to 0.0001. Which is the final adopted setting?

liuzuxin commented 1 year ago

In theory 0.0001 is the better one, but it actually doesn't matter too much for the final performance, as long as it doesn't exceed kl_mean_constraint. The intuition of setting this parameter and separating the mean and var constraints is that: we want the mean of the policy moves/converges faster than the variance, because large variance is important for exploration. I came across this trick from this paper page 20:

We always set a much smaller epsilon for covariance than the mean. The intuition is that while we would like the distribution moves fast in the action space, we also want to keep the exploration to avoid premature convergence.

In the simple environments like CarCircle, this parameter is not very sensitive and we can always obtain good performance. I also tried not decoupling mean and var (using a single KL constraint) and it also works well. But recently I tried the AntRun environment which is much more complex, and then I found this trick is indeed helpful, though more important parameters IMO are the ranges of the dual_constraint in the E-step and kl_mean_constraint in the M-step.

Zarzard commented 1 year ago

Got it, thanks for the detailed and timely answer!