Closed lknownothing closed 3 years ago
@lknownothing
Sorry about that.
torch==1.3.1
would work well.
Thank you for your reply.
At last I found that we should calculate policy_loss
after updating Q1
. I changed policy_loss, entropies = self.calc_policy_loss(batch, weights)
as follows, then it work well with pytorch==1.7.1
.
q1_loss, q2_loss, errors, mean_q1, mean_q2 =\
self.calc_critic_loss(batch, weights)
# policy_loss, entropies = self.calc_policy_loss(batch, weights) #
update_params(
self.q1_optim, self.critic.Q1, q1_loss, self.grad_clip)
update_params(
self.q2_optim, self.critic.Q2, q2_loss, self.grad_clip)
# calculate `policy_loss` after updating `Q1、Q2`
policy_loss, entropies = self.calc_policy_loss(batch, weights)
update_params(
self.policy_optim, self.policy, policy_loss, self.grad_clip)
@lknownothing
Thank you for your suggestion ;) Could you please make a Pull Request??
BTW, I think it's better to delete variables, which contain heavy gradient information, after updating to reduce GPU usage.
(e.g. Delete q*_loss
after updating q*_optim
)
For example, you can do it by updating critic and actor in separate functions. https://github.com/ku2482/gail-airl-ppo.pytorch/blob/master/gail_airl_ppo/algo/sac.py
Anyway, I really appreciate your suggestion ;)
Thank you for sharing. When I was running the code with pytorch 1.7, I encountered a problem
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation
.Can you tell me which version of PyTorch to use?