[Question] Inference time speed up or not?

kakaobrain / torchgpipe

A GPipe implementation in PyTorch

BSD 3-Clause "New" or "Revised" License

814 stars 98 forks source link

The purpose of GPipe is:

Be able to train or evaluate large model which cannot be placed on a single GPU.
Reduce GPU idle time during model parallelism by pipelining. But there is still idle time. We call it "bubble".

If your model isn't large enough, you don't need GPipe.

For more details, see:

Also, GPipe itself does not provide latency measurement. We'd used NVIDIA Nsight Systems to optimize its communication cost.

kakaobrain / torchgpipe