liangwq / Chatglm_lora_multi-gpu

chatglm多gpu用deepspeed和
404 stars 61 forks source link

out of memory 显存溢出 #15

Open 2023March opened 1 year ago

2023March commented 1 year ago

您好,我设定了, 512的长度, batch size = 1 还是显存溢出。

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 372.00 MiB (GPU 6; 31.75 GiB total capacity; 29.02 GiB already allocated; 35.75 MiB free; 29.86 GiB reserved in total by PyT orch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

请问有解决办法么?

liangwq commented 1 year ago

您好,我设定了, 512的长度, batch size = 1 还是显存溢出。

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 372.00 MiB (GPU 6; 31.75 GiB total capacity; 29.02 GiB already allocated; 35.75 MiB free; 29.86 GiB reserved in total by PyT orch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

请问有解决办法么?

--per_device_train_batch_size 2 \ --gradient_accumulation_steps 1 \ --fp16 \ 这几个都是会影响显存的 还有最大长度、以及deepspeed的zero哪个stage