openai / guided-diffusion

MIT License
6.06k stars 807 forks source link

Any explanation for the low GPU utilization #39

Open ShoufaChen opened 2 years ago

ShoufaChen commented 2 years ago

Hi, @unixpickle @prafullasd

Thanks for your wonderful work.

I'd like to know is there any explanation for the low GPU utilization?

image
unixpickle commented 2 years ago

Hi Shoufa,

It's quite hard to actually utilize every FLOP available on the GPU. When you run a command like nvidia-smi and it claims you are at 100%, that does not actually mean you are at the maximum FLOP throughput of the GPU. In fact, if you ever compute the theoretical speed your training job should be going at given the number of FLOPs in the model, you will likely find the same thing: the model theoretically should be running faster on your GPU than it is.

On Mon, Jun 13, 2022 at 9:34 AM Shoufa Chen @.***> wrote:

Hi, @unixpickle https://github.com/unixpickle @prafullasd https://github.com/prafullasd

Thanks for your wonderful work.

I'd like to know is there any explanation for the low GPU utilization?

[image: image] https://user-images.githubusercontent.com/28682908/173365363-2e1fbfb0-44c8-4e99-9ae8-155ec8af8aff.png

— Reply to this email directly, view it on GitHub https://github.com/openai/guided-diffusion/issues/39, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADDEBOVYPNRIP344AGCGWTVO42FRANCNFSM5YUH3WNA . You are receiving this because you were mentioned.Message ID: @.***>

ShoufaChen commented 2 years ago

Thanks for your reply.

So the utilization in the above table is calculated by the percentage of FLOPS instead of nvidia-smi.

Is that right?