vwxyzjn / lm-human-preference-details

RLHF implementation details of OAI's 2019 codebase
MIT License
152 stars 7 forks source link

Deepspeed integration for 7B models #19

Closed vwxyzjn closed 1 year ago

vwxyzjn commented 1 year ago

This PR attempts to bring deepspeed integration to empower tuning with 7B models. In the summarize from human feedback paper, the experimented with 1.3B, 2.7B, and 6.7B models, so this PR would in principle allow us to replicate that work.

Some of the notable changes needed to make things work:

Here is a training run https://wandb.ai/costa-huang/cleanRL/runs/kve7tu43/overview with

accelerate launch --config_file deepspeed.yaml lm_human_preference_details/train_policy_accelerate.py \
    --rewards.trained_model ''  \
    --base_model tiiuae/falcon-7b  \
    --no_use_tensorflow_adam \
    --ppo.gradient_accumulation_steps 64 \
    --track

Training results was pretty bad, but I think this is probably some issue related to model compatibility. To replicate summarize from human feedback paper, we should probably use the OPT models which have 1.3B, 2.7B, and 6.7B models.

CC @lewtun

vwxyzjn commented 1 year ago

Confirmed that it can reasonably run 7b models (no benchmark results yet)


SAVE_PATH_REWARD="models/train_7b_$(date +%s)/reward.pt"
SAVE_PATH_POLICY="models/train_7b_$(date +%s)/policy.pt"
poetry run accelerate launch --config_file deepspeed.yaml  lm_human_preference_details/train_reward_accelerate.py \
    --base_model cerebras/Cerebras-GPT-6.7B \
    --no_use_tensorflow_adam \
    --gradient_accumulation_steps=4 \
    --local_rollout_batch_size=4 \
    --save_path=$SAVE_PATH_REWARD \
    --track && \
    poetry run accelerate launch --config_file deepspeed.yaml  lm_human_preference_details/train_policy_accelerate.py \
    --rewards.trained_model=$SAVE_PATH_REWARD \
    --base_model=cerebras/Cerebras-GPT-6.7B \
    --deepspeed \
    --no_use_tensorflow_adam \
    --ppo.gradient_accumulation_steps 64 \
    --track

https://wandb.ai/costa-huang/cleanRL/runs/hn9wtka9?workspace=user-costa-huang

image