Here are my script, it can run with hybrid_parallel plugin, but other plugins have the same error "out of memory"
torchrun --standalone --nproc_per_node 8 finetune.py \
--plugin "gemini_auto" \
--dataset "self_instruct" \
--model_path "Llama2-Chinese-7b-Chat" \
--task_name "finetuning" \
--batch_size 2 \
--save_dir "output_test"
Environment
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 2.00 GiB (GPU 4; 79.21 GiB total capacity; 75.40 GiB already allocated; 1.74 GiB free; 76.12 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
🐛 Describe the bug
Here are my script, it can run with hybrid_parallel plugin, but other plugins have the same error "out of memory" torchrun --standalone --nproc_per_node 8 finetune.py \ --plugin "gemini_auto" \ --dataset "self_instruct" \ --model_path "Llama2-Chinese-7b-Chat" \ --task_name "finetuning" \ --batch_size 2 \ --save_dir "output_test"
Environment
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 2.00 GiB (GPU 4; 79.21 GiB total capacity; 75.40 GiB already allocated; 1.74 GiB free; 76.12 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF