Deepspeed and training code print different throughput

lucidrains / DALLE-pytorch

Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch

MIT License

5.57k stars 642 forks source link

Deepspeed and training code print different throughput #411

Open wintersurvival opened 2 years ago

wintersurvival commented 2 years ago

When training with 8 GPU, the throughput printed by Deepspeed is much smaller than throughput calculated by training code: deepspeed SamplesPerSec=505 sample_per_sec: 50120

It seems that the throughput calculated by training code = throughput printed by Deepspeed * gradient_steps Which number is accurate? @lucidrains @janEbert

janEbert commented 2 years ago

Hey! @rom1504 implemented that calculation. :)