OutOfMemoryError: CUDA out of memory

Hi All i am running on nvidia gtx 1080ti (11GB Video Memory) on windows 11 i get following error on running inference_example on llama-7b-hf model

OutOfMemoryError: CUDA out of memory. Tried to allocate 64.00 MiB (GPU 0; 11.00GiB total capacity; 10.29 GiB already allocated; 0 bytes free; 10.29 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

May i know how much memory is required to run this model locally Also is there any work around

Thanks

ypeleg / llama

OutOfMemoryError: CUDA out of memory #11