sweetice / Deep-reinforcement-learning-with-pytorch

PyTorch implementation of DQN, AC, ACER, A2C, A3C, PG, DDPG, TRPO, PPO, SAC, TD3 and ....
MIT License
3.88k stars 844 forks source link

SAC_Bug #38

Open aut6620 opened 2 years ago

aut6620 commented 2 years ago

in sac.py s = torch.tensor([t.s for t in self.replay_buffer]).float().to(device) Traceback (most recent call last): File "D:\PycharmProject\Deep-reinforcement-learning-with-pytorch-master\Char09 SAC\SAC.py", line 307, in main() File "D:\PycharmProject\Deep-reinforcement-learning-with-pytorch-master\Char09 SAC\SAC.py", line 293, in main agent.update() File "D:\PycharmProject\Deep-reinforcement-learning-with-pytorch-master\Char09 SAC\SAC.py", line 244, in update Q_loss.backward(retain_graph = True) File "C:\Users\lx\anaconda3\envs\torch\lib\site-packages\torch_tensor.py", line 363, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs) File "C:\Users\lx\anaconda3\envs\torch\lib\site-packages\torch\autograd__init__.py", line 173, in backward Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass RuntimeError: Found dtype Double but expected Float

zhaoyanghandd commented 2 years ago

How to deal with it?

aut6620 commented 2 years ago
        V_loss = self.value_criterion(excepted_value, next_value.detach()).mean()  # J_V

        # Dual Q net
        Q1_loss = self.Q1_criterion(excepted_Q1.float(), next_q_value.detach()**.float()**).mean() # J_Q

        # Q1_loss = Q1_loss.folat()

        Q2_loss = self.Q2_criterion(excepted_Q2.float(), next_q_value.detach().float()).mean()
        # Q2_loss = Q2_loss.float()

        pi_loss = (log_prob.float() - excepted_new_Q.float()).mean() # according to original paper

image

aut6620 commented 2 years ago

1、change all the dtype to float 2、then i met the next bug,the picture is what i had done