soumith / imagenet-multiGPU.torch

an imagenet example in torch.
BSD 2-Clause "Simplified" License
401 stars 158 forks source link

Train Alexnet with 4 GPUs seems slower than one #59

Closed yiheng closed 8 years ago

yiheng commented 8 years ago

Hi.

I have 4 GPUs on my machine, and they're connected with PCIE16. When I train AlexnetOWT model with nGPU=4, one batch takes 2.3s; but when nGPU=1, one batch takes only 0.48s. Does this look correct?

backend is cudnn, nDockeys=32 as there's 32 cpu cores, others use the default parameter.

Thanks

soumith commented 8 years ago

i've verified that with the latest install of cunn package, this is fixed. luarocks install cunn

yiheng commented 8 years ago

Update this thread, hope people who suffer similiar issue have some reference.

The root cause is the PCIE ACS of the motherboard is enabled. So the PCIE speed was limited. After it is disabled, the performance looks good.

Check this post https://www.supermicro.com/support/faqs/faq.cfm?faq=20732.