Open surdd33 opened 2 weeks ago
Hi @surdd33: Batch size set to 1 should not take up more than 24G of memory, can you provide the error log. Best, woldier
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 12.00 MiB (GPU 0; 23.65 GiB total capacity; 22.44 GiB already allocated; 6.75 MiB free; 22.57 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
After my testing, I found that the training framework doesn't work properly with batch size set to 1. So I reduced the batch size to 2, and found that it didn't take up more than 24 gigabytes of memory.
Fig1. Console Print
Fig2. nvidia-smi concole
In That Case I suggest you check that the batch size is set correctly in the console print configuration file @surdd33
Thank you very much for taking the time to answer my question, but after trying, I still can't. I see that there are 4 GPUs in your picture, and if they are trained at the same time, does the memory exceed 24GB?while I only have one 24GB 4090
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 12.00 MiB (GPU 0; 23.65 GiB total capacity; 22.43 GiB already allocated; 6.75 MiB free; 22.57 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF and,look the nvidia,this just use 500MiB
In fact, for DDP training, when we set the batch size, each GPU loads a batch size of samples. This is the same as the number of samples loaded by the graphics card in single card training.
To allay your fears, I set the current number of available cards to 1 and the experiment worked fine.
@surdd33
very very very thank you,you are kind and patient people
@surdd33 😀I'm glad I was able to help you. 🥰If you are satisfied with my answer, then fork or star this repository and I will be honored! 🤓Finally, thank you for your interest in this work.
I have a 4090,but I can't run this code,and I set batch_size =1(set samples_per_gpu=1,workers_per_gpu=1,)