p-christ / Deep-Reinforcement-Learning-Algorithms-with-PyTorch

PyTorch implementations of deep reinforcement learning algorithms and environments
MIT License
5.51k stars 1.18k forks source link

Cart_Pole.py fails when running cuda9.0 gpu #15

Open crashmatt opened 5 years ago

crashmatt commented 5 years ago

RuntimeError: Expected object of backend CPU but got backend CUDA for argument #4 'mat1'

File "/home/matt/Dropbox/Receiver/Antenna/antenna_sims/pytorch/DRL/Agents/Base_Agent.py", line 128, in run_n_episodes self.step() File "/home/matt/Dropbox/Receiver/Antenna/antenna_sims/pytorch/DRL/Agents/DQN_Agents/DQN.py", line 32, in step self.learn() File "/home/matt/Dropbox/Receiver/Antenna/antenna_sims/pytorch/DRL/Agents/DQN_Agents/DQN.py", line 56, in learn loss = self.compute_loss(states, next_states, rewards, actions, dones) File "/home/matt/Dropbox/Receiver/Antenna/antenna_sims/pytorch/DRL/Agents/DQN_Agents/DQN.py", line 63, in compute_loss Q_targets = self.compute_q_targets(next_states, rewards, dones) File "/home/matt/Dropbox/Receiver/Antenna/antenna_sims/pytorch/DRL/Agents/DQN_Agents/DQN.py", line 70, in compute_q_targets Q_targets_next = self.compute_q_values_for_next_states(next_states) File "/home/matt/Dropbox/Receiver/Antenna/antenna_sims/pytorch/DRL/Agents/DQN_Agents/DQN.py", line 76, in compute_q_values_for_next_states Q_targets_next = self.q_network_local(next_states).detach().max(1)[0].unsqueeze(1) File "/home/matt/.local/share/virtualenvs/sentient-NdmKRet4/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call result = self.forward(*input, *kwargs) File "/home/matt/.local/share/virtualenvs/sentient-NdmKRet4/lib/python3.6/site-packages/nn_builder/pytorch/NN.py", line 153, in forward x = self.get_activation(self.hidden_activations, layer_ix)(linear_layer(x)) File "/home/matt/.local/share/virtualenvs/sentient-NdmKRet4/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call result = self.forward(input, **kwargs) File "/home/matt/.local/share/virtualenvs/sentient-NdmKRet4/lib/python3.6/site-packages/torch/nn/modules/linear.py", line 67, in forward return F.linear(input, self.weight, self.bias) File "/home/matt/.local/share/virtualenvs/sentient-NdmKRet4/lib/python3.6/site-packages/torch/nn/functional.py", line 1352, in linear ret = torch.addmm(torch.jit._unwrap_optional(bias), input, weight.t()) RuntimeError: Expected object of backend CPU but got backend CUDA for argument #4 'mat1'

absl-py==0.7.1 astor==0.7.1 bleach==1.5.0 certifi==2019.3.9 chardet==3.0.4 cycler==0.10.0 decorator==4.4.0 EasyProcess==0.2.5 future==0.17.1 gast==0.2.2 grpcio==1.20.0 gym==0.10.9 h5py==2.9.0 html5lib==0.9999999 idna==2.8 imageio==2.5.0 Keras-Applications==1.0.7 Keras-Preprocessing==1.0.9 kiwisolver==1.0.1 Markdown==3.1 matplotlib==3.0.3 mock==2.0.0 msgpack==0.6.1 networkx==2.3 nn-builder==0.0.4 numpy==1.16.2 pbr==5.1.3 Pillow==6.0.0 protobuf==3.7.1 pyglet==1.3.2 pyparsing==2.4.0 python-dateutil==2.8.0 PyVirtualDisplay==0.2.1 PyWavelets==1.0.3 requests==2.21.0 scikit-image==0.15.0 scipy==1.2.1 six==1.12.0 tensorboard==1.13.1 tensorflow==1.13.1 tensorflow-estimator==1.13.0 tensorflow-tensorboard==1.5.1 termcolor==1.1.0 torch==1.0.1.post2 torchvision==0.2.2.post3 tqdm==4.31.1 trainer==0.0.1 urllib3==1.24.2 vizdoom==1.1.7 Werkzeug==0.15.2

p-christ commented 5 years ago

hi, does it work if you only use a CPU and not a GPU? For small games like this it should also be much faster to use a CPU

crashmatt commented 5 years ago

I didnt find a choice to select cpu if gpu exists. The code looks like it switches to it automatically.

I understand the gpu will not be effective. I was just trying to get it started on a simple reinforcement case. If you don't intend to support gpu that is ok.

yangysc commented 5 years ago

Hello, @p-christ . Although the example works in cpu mode, have you try running the example (cart_pole.py) in the gpu mode? It seems some tensors are not in gpu, I find no where to fix it.

File "/home/noone/anaconda3/envs/tf_3/lib/python3.6/multiprocessing/pool.py", line 670, in get raise self._value RuntimeError: Expected object of type torch.cuda.FloatTensor but found type torch.FloatTensor for argument #4 'mat1'

I find the error happens in PPO algorithm, policy_output = policy.forward(states).to(self.device)

def calculate_log_probability_of_actions(self, policy, states, actions):
        """Calculates the log probability of an action occuring given a policy and starting state"""
        policy_output = policy.forward(states).to(self.device)
        policy_distribution = create_actor_distribution(self.action_types, policy_output, self.action_size)
        policy_distribution_log_prob = policy_distribution.log_prob(actions)
        return policy_distribution_log_prob

In fact, I want to use PPO algorithm in my project, coud you help with debugging it? Thanks in advance.