liuzuxin / cvpo-safe-rl

Code for "Constrained Variational Policy Optimization for Safe Reinforcement Learning" (ICML 2022)
GNU General Public License v3.0
63 stars 7 forks source link

Found one wrong coding place #4

Closed minoshiro1 closed 1 year ago

minoshiro1 commented 1 year ago

Hello, i think this place in your codes is wrong. cvpo-safe-rl/safe_rl/policy/cvpo.py line-454-457 def critic_loss(): obs, act, reward, obs_next, done = to_tensor(data['obs']), to_tensor( data['act']), to_tensor(data['cost']), to_tensor( data['obs2']), to_tensor(data['done']) The ['cost'] should be['rew'] i think

liuzuxin commented 1 year ago

Hi @minoshiro1 , thanks for raising this question. Actually it is not wrong, because the reward critic loss is computed here, while the cost critic loss is where you pointed out. I reused the code for computing the reward critic loss to compute the cost critic loss and was a bit lazy to rename the variables... Sorry about the confusion. I will update the names ASAP.