allocation_mode - Githubissues

openpsi-project / ReaLHF

Super-Efficient RLHF Training of LLMs with Parameter Reallocation

Apache License 2.0

76 stars 3 forks source link

allocation_mode #45

Closed AIRobotZhang closed 5 days ago

AIRobotZhang commented 1 month ago

When allocation_mode is set as manual, how to set the following values? allocation.parallel.pipeline_parallel_size=1 allocation.parallel.model_parallel_size=2 allocation.parallel.data_parallel_size=4

If I respectively have 2 GPUs, 4 GPUs, 8 GPUs, how to set them?

In addition, how many A100 GPUs are approximately required for a 7b LLM in sft, rm and ppo?

nuzant commented 1 month ago

Please refer to ppo_manual.sh for a detailed example of allocating resources in a PPO experiment. In manual allocation, you should specify device mesh and parallel strategy for each model function call, and make sure the product of parallel degrees in parallel strategy matches the number of GPU devices in your device mesh. There are also detailed naming conventions of device mesh in the example script.

As for number of GPUs used for 7B experiments, we suggest at least 2x 80GB A100 GPUs for SFT and RM and 8x 80GB A100 GPUs for PPO. To train on less GPUs, you can try to:

Set higher model_parallel_size and pipeline_parallel_size and lower data_parallel_size.
Reduce training batch size and max sequence length.

In the future patches, we will support micro batches and gradient accumulation to allow training on less GPUs without reducing batch size.

garrett4wade commented 5 days ago

As a closing remark, ReaL now supports mini-batched execution. The user can change the n_mbs attribute in each MFCConfig. Increasing the number of mini-batches (aka n_mbs) can also reduce GPU memory usage and use less GPUs to perform training. Check this script for a concrete example.

Besides, ReaL will support arbitrary identical parallelization strategies by passing a regex-matched string to allocation_mode. For example, allocation_mode=d4m1p2 will set DP=2, TP=1, PP=2 for all model function calls. See this PR.