hpcaitech / ColossalAI

Making large AI models cheaper, faster and more accessible
https://www.colossalai.org
Apache License 2.0
38.63k stars 4.33k forks source link

[BUG]: On the eight-card A100, testing the 'examples/language/llama2' with the 'gemini_auto' plugin resulted in an 'out of memory' error." #5030

Open chensimian opened 10 months ago

chensimian commented 10 months ago

🐛 Describe the bug

Here are my script, it can run with hybrid_parallel plugin, but other plugins have the same error "out of memory" torchrun --standalone --nproc_per_node 8 finetune.py \ --plugin "gemini_auto" \ --dataset "self_instruct" \ --model_path "Llama2-Chinese-7b-Chat" \ --task_name "finetuning" \ --batch_size 2 \ --save_dir "output_test"

Environment

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 2.00 GiB (GPU 4; 79.21 GiB total capacity; 75.40 GiB already allocated; 1.74 GiB free; 76.12 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

flybird11111 commented 9 months ago

Hi, how about trying to set offload_optim_frac and offload_param_frac to 1.0?