salesforce / warp-drive

Extremely Fast End-to-End Deep Multi-Agent Reinforcement Learning Framework on a GPU (JMLR 2022)
BSD 3-Clause "New" or "Revised" License
465 stars 78 forks source link

GPU requirements #75

Closed MatPoliquin closed 1 year ago

MatPoliquin commented 1 year ago

I am testing warpdrive on a p104-100 8g I can successfully run tag_gridworld env with the test script provided in this repo: example_training_script_pycuda.py

but tag_continuous:

python example_training_script_pycuda.py -e tag_continuous

it gives me out of memory:

RuntimeError: CUDA out of memory. Tried to allocate 2.38 GiB (GPU 0; 7.93 GiB total capacity; 5.56 GiB already allocated; 1.50 GiB free; 5.58 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

So what are the minimum VRAM requirements for most envs?

Emerald01 commented 1 year ago

Simply out of memory because continuous has environment cross check between taggers and runners so you can imagine the largest data array could be the size of num_of_taggers * num_of_runners. You can see that PyTorch already reserved 5.58G and there is only 1.5G free for your GPU but you need 2.38G to setup the environment. I think you can just reduce the number of taggers and runners here https://github.com/salesforce/warp-drive/blob/master/warp_drive/training/run_configs/tag_continuous.yaml