Closed Robokan closed 1 year ago
Hi, I encountered the same issue, and according to #109, it is because rl_device='cuda:1'
doesn't work correctly.
you can either follow their solution or simply add CUDA_VISIBLE_DEVICES=[gpu_ids]
infront of your training command.
I just tried
CUDA_VISIBLE_DEVICES=1, python train.py task=Cartpole CUDA_VISIBLE_DEVICES=1, python train.py task=Cartpole rl_device='cuda:1' sim_device='cuda:1' CUDA_VISIBLE_DEVICES=[1], python train.py task=Cartpole rl_device='cuda:1' sim_device='cuda:1' CUDA_VISIBLE_DEVICES=[1], python train.py task=Cartpole
None of these work. It still crashes. I tried just using export as well. Were you able to get it to work?
@Robokan If you use CUDA_VISIBLE_DEVICES=1, you need to use cuda:0 instead of cuda:1 since you now have only one GPU exposed
Great that works. Thanks for the clarification.
**I have 2 GPU's and I want to only train on the second one so I ran:
python train.py task=Cartpole rl_device='cuda:1' sim_device='cuda:1'
it crashes saying I am still running something on cuda:0. Any ideas how to fix this?
here is the full stack trace:**
(rlenv) bizon@dl:~/eric/IsaacGymEnvs-main/isaacgymenvs$ python train.py task=Cartpole rl_device='cuda:1' sim_device='cuda:1' Importing module 'gym_38' (/home/bizon/anaconda3/envs/rlenv/lib/python3.8/site-packages/isaacgym/_bindings/linux-x86_64/gym_38.so) Setting GYM_USD_PLUG_INFO_PATH to /home/bizon/anaconda3/envs/rlenv/lib/python3.8/site-packages/isaacgym/_bindings/linux-x86_64/usd/plugInfo.json train.py:49: UserWarning: The version_base parameter is not specified. Please specify a compatability version level, or None. Will assume defaults for version 1.1 @hydra.main(config_name="config", config_path="./cfg") /home/bizon/anaconda3/envs/rlenv/lib/python3.8/site-packages/hydra/_internal/defaults_list.py:251: UserWarning: In 'config': Defaults list is missing
_self_
. See https://hydra.cc/docs/1.2/upgrades/1.0_to_1.1/default_composition_order for more information warnings.warn(msg, UserWarning) /home/bizon/anaconda3/envs/rlenv/lib/python3.8/site-packages/hydra/_internal/defaults_list.py:415: UserWarning: In config: Invalid overriding of hydra/job_logging: Default list overrides requires 'override' keyword. See https://hydra.cc/docs/1.2/upgrades/1.0_to_1.1/defaults_list_override for more information.deprecation_warning(msg) /home/bizon/anaconda3/envs/rlenv/lib/python3.8/site-packages/hydra/_internal/hydra.py:119: UserWarning: Future Hydra versions will no longer change working directory at job runtime by default. See https://hydra.cc/docs/1.2/upgrades/1.1_to_1.2/changes_to_job_working_dir/ for more information. ret = run_job( PyTorch version 1.13.1 Device count 2 /home/bizon/anaconda3/envs/rlenv/lib/python3.8/site-packages/isaacgym/_bindings/src/gymtorch Using /home/bizon/.cache/torch_extensions/py38_cu117 as PyTorch extensions root... Emitting ninja build file /home/bizon/.cache/torch_extensions/py38_cu117/gymtorch/build.ninja... Building extension module gymtorch... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module gymtorch... /home/bizon/anaconda3/envs/rlenv/lib/python3.8/site-packages/isaacgym/torch_utils.py:135: DeprecationWarning:
np.float
is a deprecated alias for the builtinfloat
. To silence this warning, usefloat
by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, usenp.float64
here. Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations def get_axis_params(value, axis_idx, x_value=0., dtype=np.float, n_dims=3): 2023-04-14 09:06:54,989 - INFO - logger - logger initialized