hpcaitech / ColossalAI

Making large AI models cheaper, faster and more accessible
https://www.colossalai.org
Apache License 2.0
38.53k stars 4.32k forks source link

[FEATURE]: Graphic card ram friendly PPO training for big model(larger than 2B) #3566

Open yynil opened 1 year ago

yynil commented 1 year ago

Describe the feature

The PPO training needs to maintain 4 models in memory at the same time. The original implementation keep the reward/actor critic/initial model in video ram at the same time. The Actor/Initial models' outputs are ids which means actions for Reward/Critic model. If reward model and actor model don't share the same tokenizer, the Ids mean nothing for reward model.

Even for the same model like bloom, developers can't keep the strong assumption that different scale models share the same tokenizer. For an example, bloom7b-mt doesn't need to share the same tokenizer with bloom-560m.

Things get even worse if we only have one LLM, like ChatGLM-6B. We even don't have chance to bet a smaller model has the same tokenizer.

So a video ram friendly PPO trainer is needed, so we only need to keep on model in video ram to do the training.

I have finished the codes and Readme doc in my fork. Later I'll submit a PR for this feature.

binmakeswell commented 1 year ago

Hi @yynil Thank you very much for your proposal and contribution. Looking forward to your further PR updates. Thanks.