openai / random-network-distillation

Code for the paper "Exploration by Random Network Distillation"
https://openai.com/blog/reinforcement-learning-with-prediction-based-rewards/
881 stars 160 forks source link

KeyError: 'RCALL_NUM_GPU' ? #1

Closed lhlong closed 1 year ago

lhlong commented 6 years ago

image Need to open mpi_util.py and change line 59 to:

if 'RCALL_NUM_GPU' in os.environ:
        n_gpus = int(os.environ['RCALL_NUM_GPU'])
Sungtae-Lee commented 6 years ago

Having the same problem

michael20at commented 6 years ago

I have the same problem and also ran into it in the large-scale-curiousity example. It appears to be a MPI Problem. I guess it is because the GPU driver path is listed only for linux, thus it won't work on windows.

Edit: I correct myself, for RND setting GPU to 1 works!

In mpi_util.py change

line 60 to available_gpus = 1

and

line 70 to os.environ['CUDA_VISIBLE_DEVICES'] = str(1)

Seems to work for me, it started to train!

cuspymd commented 5 years ago

It's not necessary to change codes. Just set the enviroment variable CUDA_VISIBLE_DEVICES on the shell.

Ploppz commented 5 years ago

@cuspymd Does this require that you have an nVidia GPU?

lucaslingle commented 5 years ago

@Ploppz

it's an environment variable you can define/set; you can set it even if you don't have an nvidia gpu.

so you can change line 59 as above and then run export CUDA_VISIBLE_DEVICES=0 from the command line, and you should be good to go.

youwasha commented 5 years ago

@lucaslingle @cuspymd I have a similar problem,

Traceback (most recent call last): File "ParaRetrieval.py", line 18, in arrayid = int(os.environ['SLURM_ARRAY_TASK_ID']) #\u5bf9\u5e94sh\u6587\u4ef6\u91cc\u7684-t,\u7528\u6765\u63a7\u5236\u5e76 \u884c\u8fd0\u7b97\u94fe\uff0cserver\u7684\u7ba1\u7406\u7cfb\u7edf\u4e3aSLURM File "/global/software/sl-7.x86_64/modules/langs/python/3.6/lib/python3.6/os.py", line 669, in getitem raise KeyError(key) from None KeyError: 'SLURM_ARRAY_TASK_ID'

Here I used 'SLURM_ARRAY_TASK_ID' to do the array task, I see the server system is SLURM with NHC. Could you tell me how can I fix this problem? Thank you very much!