Open chengxuxin opened 2 years ago
You could reduce num_envs
in legged_gym/envs/anymal_c/flat/anymal_c_flat_config.py
.
The default number is num_envs = 4096
, but you could try 2048
or 1024
to save memory.
Just add num_envs = 1024
here.
You could reduce
num_envs
inlegged_gym/envs/anymal_c/flat/anymal_c_flat_config.py
. The default number isnum_envs = 4096
, but you could try2048
or1024
to save memory. Just addnum_envs = 1024
here.
I have tried decreasing num_envs
to very small number like 1 or 2. But it still did not work. I tried to see how much memory it takes by setting --sim_device=cpu
so the memory allocation does not happen in gpu. What I found is that it takes about 4.5 GB memory which is too much for my gpu. However, increasing num_envs
from 1 to 4096 only takes 300 MB more memory. So I am wondering what the 4.5 GB memory is about and it seems to have no relation with num_envs
.
I see, I think that problem is caused by isaac gym. If you try the default ANYmal example in isaac gym, it will take more than 4.5 GB GPU memory. Sorry I do not have the answer, maybe you could ask the developers in NVIDIA. And please let me know as well if you find the solution.
I have managed to get some sort of a solution to this problem, at least for my case.
To anyone still interested in what happened...
I tried running the same examples and got the same problems, snooped around a bit and concluded that the training is done perfectly well in headless mode :
python legged_gym/scripts/train.py --task=anymal_c_flat --sim_device=cuda --rl_device=cuda --pipeline=gpu --num_envs=2048 --headless
A larger number of envs would probably work but I am heavily limited by my 4GB GTX1050TI.
It obviously does not show the simulation, but it trains everything perfectly well. After the training is done I can run the trained policy using play script on --sim_device=cpu
on a small set of envs and see how the robots behave.
HOWEVER..
Setting the --pipeline=cpu
and running the script with --sim_device=cuda
seems to do the trick!!
( Whole command being: python legged_gym/scripts/train.py --task=anymal_c_flat --sim_device=cuda --rl_device=cuda --pipeline=cpu --num_envs=256
)
Now, I still cannot run a full simulation with 4096 robots, but for example, 256 works perfectly fine. I am not sure why this happens but the idea for setting the pipeline came from here
It is worth noting that among other things, isaac gym does appear to have memory management problems. After rerunning the script a few times I noticed it fails to start even with 32 envs with "out of memory" problem, so it obviously fails to be freed when it should. Nvidia obviously knows there are some memory management issues but does not want to focus its development on this at this time.
Even though the simulation for training does work this way, I find that the best approach is to let the policy train in the --headless
mode using gpu as a sim and rl device and then just play it out using --sim_device=cuda
, and --pipeline=gpu
. This seems to provide the fastest training times and seems to limit the memory only by the number of envs, and not the existance of simulation itself.
Running the simulation purely on CPU also does the trick but tends to get laggy and have low framerates and therefore slow down the training.
Hi I have a RTX 3060 desktop and had similar poblems until I upgraded to latest versions of pytorch with CUDA and upgraded the proprietary ubuntu driver to 515. Here is my setup that fully works with --num_envs=1024 oe allowing ninja to choose
to do this on ubuntu go to seetings --> additional drivers --> and select: Nvidia deriver metapackage from nvidia-driver-515 (propritary, tested)
in a web browser: https://pytorch.org/get-started/locally/ and select Linux, Python, CUDA 11.6 which will generate a terminal command pip3 install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu116 (tested) or conda install pytorch torchvision torchaudio cudatoolkit=11.6 -c pytorch -c conda-forge (untested)
Modified setup: GPU models and configuration: GPU 0: NVIDIA GeForce RTX 3060 Nvidia driver version: 515.48.07 OS: Ubuntu 20.04.4 LTS (x86_64) GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0 Libc version: glibc-2.31 Python version: 3.8.10 (default, Mar 15 2022, 12:22:08) [GCC 9.4.0] (64-bit runtime) Python platform: Linux-5.13.0-52-generic-x86_64-with-glibc2.29 [pip3] numpy==1.19.5 [pip3] torch==1.12.0+cu116 [pip3] torchaudio==0.12.0+cu116 [pip3] torchvision==0.13.0+cu116 CUDA used to build PyTorch: 11.6 ROCM used to build PyTorch: N/A
you can now simply try e.g. cd Desktop/legged_gym-master/legged_gym/scripts python train.py --task=cassie
I met the same problem when trying to run the
python play.py --task=anymal_c_flat --sim_device=cuda --rl_device=cuda --pipeline=gpu --num_envs=1
so how do you solve the error?
When I add --headless, it works fine, so is the problem that the rendering takes up too much memory?
@chengxuxin
p.s. I saw your work on parkour at the recent ICRA 2024, great work!
The training command does not work on my laptop if
--sim_device=cuda
. It works if I use--sim_device=cpu
. I tried to only use 1 environment, but nothing seems to have changed.OS Version: Ubuntu 21.04 Nvidia Driver: 470.82.00 Graphics: RTX 3060 Laptop Pytorch: 1.10.0+cu113