toshikwa / soft-actor-critic.pytorch

PyTorch implementation of Soft Actor-Critic(SAC).
MIT License
98 stars 22 forks source link

Which version of PyTorch should be used? #1

Closed lknownothing closed 3 years ago

lknownothing commented 3 years ago

Thank you for sharing. When I was running the code with pytorch 1.7, I encountered a problem RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation.
Can you tell me which version of PyTorch to use?

toshikwa commented 3 years ago

@lknownothing Sorry about that. torch==1.3.1 would work well.

lknownothing commented 3 years ago

Thank you for your reply. At last I found that we should calculate policy_loss after updating Q1 . I changed policy_loss, entropies = self.calc_policy_loss(batch, weights) as follows, then it work well with pytorch==1.7.1.

q1_loss, q2_loss, errors, mean_q1, mean_q2 =\
            self.calc_critic_loss(batch, weights)
        # policy_loss, entropies = self.calc_policy_loss(batch, weights)  # 

        update_params(
            self.q1_optim, self.critic.Q1, q1_loss, self.grad_clip)
        update_params(
            self.q2_optim, self.critic.Q2, q2_loss, self.grad_clip)

        # calculate `policy_loss` after updating `Q1、Q2` 
        policy_loss, entropies = self.calc_policy_loss(batch, weights)
        update_params(
            self.policy_optim, self.policy, policy_loss, self.grad_clip)
toshikwa commented 3 years ago

@lknownothing

Thank you for your suggestion ;) Could you please make a Pull Request??

BTW, I think it's better to delete variables, which contain heavy gradient information, after updating to reduce GPU usage. (e.g. Delete q*_loss after updating q*_optim)

For example, you can do it by updating critic and actor in separate functions. https://github.com/ku2482/gail-airl-ppo.pytorch/blob/master/gail_airl_ppo/algo/sac.py

Anyway, I really appreciate your suggestion ;)