microsoft / DeepSpeedExamples

Example models using DeepSpeed
Apache License 2.0
5.98k stars 1.01k forks source link

step 3 "run_6.7b_lora.sh" doesn't work with a100 80GB single gpu. #786

Open sophus1004 opened 10 months ago

sophus1004 commented 10 months ago
# Copyright (c) Microsoft Corporation.
# SPDX-License-Identifier: Apache-2.0

# DeepSpeed Team

ACTOR_ZERO_STAGE="--actor_zero_stage 0"
CRITIC_ZERO_STAGE="--critic_zero_stage 0"
ACTOR_MODEL_PATH="EleutherAI/polyglot-ko-5.8b_base_model"
CRITIC_MODEL_PATH="gpt2-medium_base_RM"

OUTPUT="./output"

Num_Padding_at_Beginning=1 # this is model related

Actor_Lr=5e-4
Critic_Lr=5e-6

mkdir -p $OUTPUT

deepspeed --num_gpus 1 main.py \
   --data_path /workspace/law_QA_10 \
   --data_split 2,4,4 \
   --actor_model_name_or_path $ACTOR_MODEL_PATH \
   --critic_model_name_or_path $CRITIC_MODEL_PATH \
   --num_padding_at_beginning 1 \
   --per_device_generation_batch_size 8 \
   --per_device_training_batch_size 8 \
   --generation_batches 1 \
   --ppo_epochs 1 \
   --max_answer_seq_len 256 \
   --max_prompt_seq_len 256 \
   --actor_learning_rate ${Actor_Lr} \
   --critic_learning_rate ${Critic_Lr} \
   --num_train_epochs 1 \
   --lr_scheduler_type cosine \
   --gradient_accumulation_steps 16 \
   --num_warmup_steps 100 \
   --deepspeed --seed 1234 \
   ${ACTOR_ZERO_STAGE} \
   ${CRITIC_ZERO_STAGE} ${OFFLOAD}\
   --actor_lora_dim 128 \
   --actor_gradient_checkpointing \
   --critic_gradient_checkpointing \
   --actor_dropout 0.0 \
   --enable_hybrid_engine \
   --output_dir $OUTPUT \
    &> $OUTPUT/training.log

This is step 3 "run_6.7b_lora.sh" code and I just change actor_model_name_or_path and critic_model_name_or_path.

I used 5.8b model (GPT-NeoX base) but It didn't work with memory out error.

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 21.92 GiB (GPU 0; 79.19 GiB total capacity; 76.83 GiB already allocated; 1.24 GiB free; 76.86 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Can I ask what is problem it is?

DengNingyuan commented 4 months ago

Have you solved this problem?