Open gunnxx opened 2 years ago
Have you managed to get this to run on cuda:1 on a 2 GPU system? Even with your change adding --rl_device cuda:1 and --sim_device cuda:1 always results in a segmentation fault.
The easiest solution to this is to set your environment variable CUDA_VISIBLE_DEVICES=
export CUDA_VISIBLE_DEVICES=1
Problem
ase.learning.common_agent.CommonAgent
inheritsrl_games.common.a2c_common.A2CBase
which stores all tensors toself.ppo_device
.self.ppo_device
is set by gettingdevice
key fromconfig
. If there is nodevice
key, it is set tocuda:0
by default. (see here)run.py
file,config
is supplied bycfg_train["params"]["config"]
. You can printcfg_train["params"]["config"].keys()
and there is nodevice
.How to check
To check this issue, simply run the original pretraining command with
--rl_device
argument is set to another cuda device such ascuda:1
and it still consumescuda:0
memory.How to fix
To fix this, simply add
cfg_train["params"]["config"]["device"] = args.rl_device
in functionload_cfg()
.