hpcaitech / ColossalAI

Making large AI models cheaper, faster and more accessible
https://www.colossalai.org
Apache License 2.0
38.77k stars 4.34k forks source link

[FEATURE]: Dose GeminiStrategy cpu placement support llama7b(2048) reward training on single A100? #4470

Open baibaiw5 opened 1 year ago

baibaiw5 commented 1 year ago

Describe the feature

Hi, I use the following packages:

colossalai 0.3.1 torch 2.0.1 transformers 4.28.1

And the follwing command to run llama-7b on A100(80G)

torchrun --standalone --nproc_per_node=1 train_reward_model.py \ --strategy colossalai_gemini_cpu \ --model llama \ --pretrain /data/checkpoints/share_gpt_7b/checkpoint-1300-fp16 \ --dataset /data/projects/DeepSpeedExamples/applications/DeepSpeed-Chat \ --save_path /data/checkpoints/colossal_llama7b_rm_ckpt \ --max_epochs 10 \ --batch_size 1 \ --max_len 2048 \ --lora_rank 0 \ --loss_fn 'log_sig'\

OOM Error will occur:

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 256.00 MiB (GPU 0; 79.20 GiB total capacity; 76.06 GiB already allocated; 201.31 MiB free; 77.51 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

If i lower the max_len to 1000,it will runing ok.

Issues-translate-bot commented 1 year ago

Bot detected the issue body's language is not English, translate it automatically. πŸ‘―πŸ‘­πŸ»πŸ§‘β€πŸ€β€πŸ§‘πŸ‘«πŸ§‘πŸΏβ€πŸ€β€πŸ§‘πŸ»πŸ‘©πŸΎβ€πŸ€β€πŸ‘¨πŸΏπŸ‘¬πŸΏ


Title: [FEATURE]: Does Gemini Strategy cpu placement support llama 7b(2048) reward training on single A100?

baibaiw5 commented 1 year ago

I have change the following code to use cpu placement,and have a 1T CPU memory. GPU oom still occurs: strategy = GeminiStrategy(placement_policy='cpu')