rail-berkeley / rlkit

Collection of reinforcement learning algorithms
MIT License
2.52k stars 553 forks source link

Docker Image Does Not Work #9

Closed avaziri closed 6 years ago

avaziri commented 6 years ago

I have attempted to run the TD3 example script from the rlkit-gpu Docker image with no success. I had to modify the TD3 example script slightly because I dont have a Mujoco license, so it instead runs MountainCarContinuous-v0. It runs just fine on my local machine from RLKit source, but when I try to run from within the docker container I get the following error:

THCudaCheck FAIL file=/pytorch/torch/lib/THC/THCGeneral.c line=70 error=30 : unknown error
Traceback (most recent call last):
  File "examples/td3.py", line 111, in <module>
    experiment(variant)
  File "examples/td3.py", line 84, in experiment
    algorithm.cuda()
  File "/rlkit/rlkit/torch/torch_rl_algorithm.py", line 37, in cuda
    net.cuda()
  File "/env/lib/python3.5/site-packages/torch/nn/modules/module.py", line 216, in cuda
    return self._apply(lambda t: t.cuda(device))
  File "/env/lib/python3.5/site-packages/torch/nn/modules/module.py", line 146, in _apply
    module._apply(fn)
  File "/env/lib/python3.5/site-packages/torch/nn/modules/module.py", line 152, in _apply
    param.data = fn(param.data)
  File "/env/lib/python3.5/site-packages/torch/nn/modules/module.py", line 216, in <lambda>
    return self._apply(lambda t: t.cuda(device))
  File "/env/lib/python3.5/site-packages/torch/_utils.py", line 69, in _cuda
    return new_type(self.size()).copy_(self, async)
  File "/env/lib/python3.5/site-packages/torch/cuda/__init__.py", line 358, in _lazy_new
    _lazy_init()
  File "/env/lib/python3.5/site-packages/torch/cuda/__init__.py", line 121, in _lazy_init
    torch._C._cuda_init()
RuntimeError: cuda runtime error (30) : unknown error at /pytorch/torch/lib/THC/THCGeneral.c:70

Can you confirm that the td3 example runs on the docker container without issue for you?

avaziri commented 6 years ago

I was able to work through a few errors and get it to work. One thing worth noting in the directions would be that the user should install nvidia-docker.

vitchyr commented 6 years ago

Great, I'm glad you managed to fix this. I'll add a note about nvidia-docker it to the README--thanks!