Closed minoshiro1 closed 1 year ago
Hi @minoshiro1 , thanks for raising this question. Actually it is not wrong, because the reward critic loss is computed here, while the cost critic loss is where you pointed out. I reused the code for computing the reward critic loss to compute the cost critic loss and was a bit lazy to rename the variables... Sorry about the confusion. I will update the names ASAP.
Hello, i think this place in your codes is wrong. cvpo-safe-rl/safe_rl/policy/cvpo.py line-454-457 def critic_loss(): obs, act, reward, obs_next, done = to_tensor(data['obs']), to_tensor( data['act']), to_tensor(data['cost']), to_tensor( data['obs2']), to_tensor(data['done']) The ['cost'] should be['rew'] i think