princeton-nlp / MeZO

[NeurIPS 2023] MeZO: Fine-Tuning Language Models with Just Forward Passes. https://arxiv.org/abs/2305.17333
MIT License
1.02k stars 60 forks source link

Any benchmark on (MeZO) v.s. (ZeRO + CpuOffload + Grad checkpointing) ? #1

Closed xingchensong closed 1 year ago

xingchensong commented 1 year ago

Appreciate your excellent work!

Out of curiosity, have you ever compared (MeZO) with other GPU memory-efficient technologies such as (ZeRO-stage1/2/3)? I would be delighted to see metrics on training speed and the largest model that can be trained on a single A100 80GB.

Furthermore, it would be intriguing to see a comparison between (MeZO) and (ZeRO + CpuOffload + Grad checkpointing) since the latter also incorporates Just Forward Passes.

gaotianyu1350 commented 1 year ago

Hi,

Thanks for your interest in our work! We did not test those variants as stated in Section 3.4. To just intuitively compare to those methods you mentioned:

xingchensong commented 1 year ago

Great, thanks~ I'm happy to stay tuned for your futurework !