rail-berkeley / rlkit

Collection of reinforcement learning algorithms
MIT License
2.45k stars 550 forks source link

about cuda error during replay weight #94

Closed YunchuZhang closed 4 years ago

YunchuZhang commented 4 years ago

Hi there, I am trying to train sac with her. After get the expert weight, when I try to visualize the policy, it turns out error that : Policy and environment loaded Traceback (most recent call last): File "scripts/run_goal_conditioned_policy.py", line 60, in simulate_policy(args) File "scripts/run_goal_conditioned_policy.py", line 15, in simulate_policy policy = data['trainer/policy'] File "/home/yunchuz/rlkit/rlkit/samplers/rollout_functions.py", line 41, in multitask_rollout a, agent_info = agent.get_action(new_obs, get_action_kwargs) File "/home/yunchuz/rlkit/rlkit/torch/sac/policies.py", line 63, in get_action actions = self.get_actions(obs_np[None], deterministic=deterministic) File "/home/yunchuz/rlkit/rlkit/torch/sac/policies.py", line 67, in get_actions return eval_np(self, obs_np, deterministic=deterministic)[0] File "/home/yunchuz/rlkit/rlkit/torch/core.py", line 18, in eval_np outputs = module(*torch_args, *torch_kwargs) File "/home/yunchuz/miniconda3/envs/rlkit/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in call result = self.forward(input, kwargs) File "/home/yunchuz/rlkit/rlkit/torch/sac/policies.py", line 83, in forward h = self.hidden_activation(fc(h)) File "/home/yunchuz/miniconda3/envs/rlkit/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in call result = self.forward(*input, **kwargs) File "/home/yunchuz/miniconda3/envs/rlkit/lib/python3.6/site-packages/torch/nn/modules/linear.py", line 87, in forward return F.linear(input, self.weight, self.bias) File "/home/yunchuz/miniconda3/envs/rlkit/lib/python3.6/site-packages/torch/nn/functional.py", line 1370, in linear ret = torch.addmm(bias, input, weight.t()) RuntimeError: Expected object of device type cuda but got device type cpu for argument #2 'mat1' in call to _th_addmm

Do you know what is the result for it?

rstrudel commented 4 years ago

Hi,

The error: RuntimeError: Expected object of device type cuda but got device type cpu for argument #2 'mat1' in call to _th_addmm, means that you are trying to multiply tensors located on a gpu with tensors located on a cpu.

If you want to run things on gpu, try adding these lines in the code before doing the rollouts:

import rlkit.torch.pytorch_util as ptu
ptu.set_gpu_mode(True)

If on cpu, simply put the last argument to False

YunchuZhang commented 4 years ago

I tried ptu.set_gpu_mode(False). It still has the same error. When I train in GPU and get weights, test on cpu, it has this error. But when I train in cpu get weights and test in cpu, it will not have this error. Do you know the reason?

rstrudel commented 4 years ago

Where does your issue come from: pytorch saves the device you were using during training, when you load the model again, it loads it on the same device used for training. Thus when you train on cpu, the model weights are again loaded on cpu and the matrix multiplication is cpu x cpu. Whereas, when you load a gpu trained model, the is an operation gpu x cpu and pytorch is not happy with that.

In "scripts/run_goal_conditioned_policy.py", line 15, in simulate_policy policy = data['trainer/policy']

Add .cpu() at the end of the line and it will set the model weights on your cpu. If you want to evaluate on gpu then you should set the data fed to the model on your gpu first and keep the model weights on gpu. Pytorch tutorial on tensors is a good ressource to check out how this works.

vitchyr commented 4 years ago

Yes, @rstrudel 's suggestiom should work. Otherwise feel free to reopen the issue. Thank you @rstrudel !

YunchuZhang commented 4 years ago

thanks, are the data['trainer/policy'] and data['evaluation/policy'] same?

vitchyr commented 4 years ago

For SAC, the trainer policy is stochastic while the evaluation one is deterministic.

On Sun, Jan 26, 2020, 5:55 PM Yunchu notifications@github.com wrote:

thanks, are the data['trainer/policy'] and data['evaluation/policy'] same?

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/vitchyr/rlkit/issues/94?email_source=notifications&email_token=AAJ4VZIZ5FDASPCDILY3ZSTQ7Y5HTA5CNFSM4KLU35AKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJ6ECNQ#issuecomment-578568502, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJ4VZKYBUHBGILJ2OZ5U5TQ7Y5HTANCNFSM4KLU35AA .

YunchuZhang commented 4 years ago

thanks, for evaluation deterministic policy, how could i load it to gpu? policy.to(ptu.device) AttributeError: 'MakeDeterministic' object has no attribute 'to'

vitchyr commented 4 years ago

Good point. You should be able to do policy.policy.to(...), but l'll put a fix for that soon.

On Tue, Jan 28, 2020, 12:07 PM Yunchu notifications@github.com wrote:

thanks, for evaluation deterministic policy, how could i load it to gpu? policy.to(ptu.device) AttributeError: 'MakeDeterministic' object has no attribute 'to'

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/vitchyr/rlkit/issues/94?email_source=notifications&email_token=AAJ4VZOT6SJH63GGGLEZ2XLRACGA5A5CNFSM4KLU35AKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEKEWYZY#issuecomment-579431527, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJ4VZMC3RL45ZNIQS3JNIDRACGA5ANCNFSM4KLU35AA .

YunchuZhang commented 4 years ago

thanks