rlworkgroup / garage

A toolkit for reproducible reinforcement learning research.
MIT License
1.86k stars 310 forks source link

compute advantages: wrong device #2181

Open axelbr opened 3 years ago

axelbr commented 3 years ago

In line 86 (torch/_functions.py), a runtime error is thrown when the rewards/baselines and filters are located on different devices. I suggest to either introduce an optional parameter device or querying it using global_device().

Exact error message:

... line 86, in compute_advantages
advantages = F.conv2d(deltas, adv_filter, stride=1).reshape(rewards.shape)

RuntimeError: Input type (torch.cuda.FloatTensor) and weight type (torch.FloatTensor) should be the same
avnishn commented 3 years ago

Hi @axelbr thanks for checking out Garage and opening this issue!

while we could provide a device parameter, this would be the same as if a user made the following call to

compute_advantages(discount, gae_lambda, max_episode_length, baselines.to(device),
                       rewards.to(device))

I also am against using the global_device by default, as this can cause unexpected behaviors in terms of device ussage.

I think that the best way forward is to handle this the same way that torch does, which is to let users handle this on their own.

@krzentner, @irisliucy, what do you think?

axelbr commented 3 years ago

No, not entirely imo. What if i would want to compute the returns on the gpu? The adv_filters would still live on the cpu which leads to the described error.

krzentner commented 3 years ago

Ah, I see. Yeah, we should move the deltas and the advantage filter to the same device as the rewards before computing the convolution.

axelbr commented 3 years ago

@krzentner Similar issue here: gaussian_mlp_module.py.

After calling .to(), self._init_std will be on GPU, but the zero-vector is still on CPU.