tlc4418 / llm_optimization

A repo for RLHF training and BoN over LLMs, with support for reward model ensembles.
https://arxiv.org/abs/2310.02743
MIT License
26 stars 1 forks source link

How many GPUs (and VRAM) to use per stage? #4

Closed RylanSchaeffer closed 1 month ago

RylanSchaeffer commented 2 months ago

For each of the different stages of the project (SFT, reward model training, PPO), how many GPUs should be used, and with how much VRAM apiece?

tlc4418 commented 1 month ago

So I was working with 80gb A100s. I was using 1 GPU for SFT and RM training if I remember correctly, and 4 GPUs for the PPO training. But I know some people have used this code and made it work with smaller/fewer GPUs (for example on 32gb V100s, by reducing batch/chunk size)

RylanSchaeffer commented 1 month ago

Perfect sounds good! Thank you :)