Hi, I use the following packages:

colossalai 0.3.1 torch 2.0.1 transformers 4.28.1

And the follwing command to run llama-7b on A100(80G)

torchrun --standalone --nproc_per_node=1 train_reward_model.py \ --strategy colossalai_gemini_cpu \ --model llama \ --pretrain /data/checkpoints/share_gpt_7b/checkpoint-1300-fp16 \ --dataset /data/projects/DeepSpeedExamples/applications/DeepSpeed-Chat \ --save_path /data/checkpoints/colossal_llama7b_rm_ckpt \ --max_epochs 10 \ --batch_size 1 \ --max_len 2048 \ --lora_rank 0 \ --loss_fn 'log_sig'\

OOM Error will occur:

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 256.00 MiB (GPU 0; 79.20 GiB total capacity; 76.06 GiB already allocated; 201.31 MiB free; 77.51 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

If i lower the max_len to 1000,it will runing ok.

Issues-translate-bot commented 1 year ago

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿

Title: [FEATURE]: Does Gemini Strategy cpu placement support llama 7b(2048) reward training on single A100?

baibaiw5 commented 1 year ago

I have change the following code to use cpu placement,and have a 1T CPU memory. GPU oom still occurs: strategy = GeminiStrategy(placement_policy='cpu')

hpcaitech / ColossalAI

[FEATURE]: Dose GeminiStrategy cpu placement support llama7b(2048) reward training on single A100? #4470

Describe the feature

Hi, I use the following packages:

And the follwing command to run llama-7b on A100(80G)

OOM Error will occur: