Closed MatPoliquin closed 1 year ago
Simply out of memory because continuous has environment cross check between taggers and runners so you can imagine the largest data array could be the size of num_of_taggers * num_of_runners. You can see that PyTorch already reserved 5.58G and there is only 1.5G free for your GPU but you need 2.38G to setup the environment. I think you can just reduce the number of taggers and runners here https://github.com/salesforce/warp-drive/blob/master/warp_drive/training/run_configs/tag_continuous.yaml
I am testing warpdrive on a p104-100 8g I can successfully run tag_gridworld env with the test script provided in this repo: example_training_script_pycuda.py
but tag_continuous:
python example_training_script_pycuda.py -e tag_continuous
it gives me out of memory:
RuntimeError: CUDA out of memory. Tried to allocate 2.38 GiB (GPU 0; 7.93 GiB total capacity; 5.56 GiB already allocated; 1.50 GiB free; 5.58 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
So what are the minimum VRAM requirements for most envs?