microsoft / DeepSpeedExamples

Example models using DeepSpeed
Apache License 2.0
5.97k stars 1.01k forks source link

ppo zero stage 3 generate too slow #718

Open LSX-Sneakerprogrammer opened 1 year ago

LSX-Sneakerprogrammer commented 1 year ago

Hi, when I ran ppo with bloomz-7b1-mt and bloom-560m (prompt_len = answer_len = 256) with zero stage 3 (8A100-40G), it seems the generation time is too slow (average about 72s). When I setting zero stage = 2, it consumes 0.8s with the same hyper-parameters for generation, but cause OOM after one step (I use totally 24A100-40G but still cause OOM). Currently, I consider the reason of generating slowly is using zero stage 3, but it consumes a lot of resource if I do not use this. Is there any solution for solving this problem? Thanks a lot!

BaenRH commented 11 months ago

Have you solved this problem? When I use PPO with zero 3 and cpu offload on the 2*8A800-80G to generate the length of 2000, it takes about 1500 seconds. Is this normal?