nv-tlabs / ASE

Other
795 stars 130 forks source link

Minor bug: agent is using cuda:0 device no matter what rl_device arg is #14

Open gunnxx opened 2 years ago

gunnxx commented 2 years ago

Problem

How to check

To check this issue, simply run the original pretraining command with --rl_device argument is set to another cuda device such as cuda:1 and it still consumes cuda:0 memory.

python ase/run.py --task HumanoidAMPGetup --cfg_env ase/data/cfg/humanoid_ase_sword_shield_getup.yaml --cfg_train ase/data/cfg/train/rlg/ase_humanoid.yaml --motion_file ase/data/motions/reallusion_sword_shield/dataset_reallusion_sword_shield.yaml --headless --rl_device cuda:1

How to fix

To fix this, simply add cfg_train["params"]["config"]["device"] = args.rl_device in function load_cfg().

Robokan commented 1 year ago

Have you managed to get this to run on cuda:1 on a 2 GPU system? Even with your change adding --rl_device cuda:1 and --sim_device cuda:1 always results in a segmentation fault.

VineetTambe commented 3 months ago

The easiest solution to this is to set your environment variable CUDA_VISIBLE_DEVICES= ex:

export CUDA_VISIBLE_DEVICES=1