Closed jimmy-walker closed 10 months ago
You can try the following two methods:
float16
in line 122 with bfloat16
allocate
Please let me know if there are still issues :)
Thanks for your reply. @MikeDean2367 I have tried your solutions. But it still got false message.
bfloat16
, but it reminds me that AssertionError:
bfloat16is not supported on your device. Please set
dtypeto
float16or
float32``.2.I have tried to not specify the allocate
, but it still got the out of memory:
OutOfMemoryError: CUDA out of memory. Tried to allocate 82.00 MiB (GPU 0; 10.76 GiB total capacity; 9.63 GiB
already allocated; 75.56 MiB free; 9.86 GiB reserved in total by PyTorch) If reserved memory is >> allocated
memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and
PYTORCH_CUDA_ALLOC_CONF
Hello, we have fixed this issue. When setting the --allocate
, please ensure that the first GPU occupies a smaller amount of memory according to the link.
Please let me know if there are still issues :)
Thanks for your reply. @MikeDean2367 I have read the link you provided, and then I changed the command as followed which first gpu only takes a little memory compared to other gpus. But it still output the error.
CUDA_VISIBLE_DEVICES=0,1,2,3 python examples/generate_lora.py --base_model /data2/user/LLM/knowlm-13b-zhixi --multi_gpu --allocate [2,10,10,10] --run_ie_cases
OutOfMemoryError: CUDA out of memory. Tried to allocate 16.00 MiB (GPU 0; 10.76 GiB total capacity; 1.84 GiB
already allocated; 7.94 GiB free; 2.00 GiB allowed; 1.99 GiB reserved in total by PyTorch) If reserved memory
is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory
Management and PYTORCH_CUDA_ALLOC_CONF
Do you pull the code from the latest repository? We have updated the code :)
Yes, you are right. The error is gone with your updated code. Thank you indeed. @MikeDean2367
I have 8 NVIDIA GeForce RTX 2080 Ti. I have used following command to run the project:
But I faced the error:
It's so wired why 1.94 GiB free is available, but fail to allocate 16.00 MiB?
My enviroment is as followed:
Any help is appreciated.