liuzuxin / cvpo-safe-rl

Code for "Constrained Variational Policy Optimization for Safe Reinforcement Learning" (ICML 2022)
GNU General Public License v3.0
63 stars 7 forks source link

About the term ``max_q`` used in the dual function. #5

Closed Zarzard closed 1 year ago

Zarzard commented 1 year ago

Hi! I wonder what role does the term max_q play in the implementation of the dual function. This term seems not mentioned in the paper, did I miss anything?

liuzuxin commented 1 year ago

Hi @Zarzard , max_q is a trick to prevent the number in exp() explode. It improves numerical stability without affecting the final results. You can also find the same trick in PyTorch's logsumexp function, see here

Zarzard commented 1 year ago

Got it, thanks!