Closed yiheng closed 8 years ago
i've verified that with the latest install of cunn package, this is fixed. luarocks install cunn
Update this thread, hope people who suffer similiar issue have some reference.
The root cause is the PCIE ACS of the motherboard is enabled. So the PCIE speed was limited. After it is disabled, the performance looks good.
Check this post https://www.supermicro.com/support/faqs/faq.cfm?faq=20732.
Hi.
I have 4 GPUs on my machine, and they're connected with PCIE16. When I train AlexnetOWT model with nGPU=4, one batch takes 2.3s; but when nGPU=1, one batch takes only 0.48s. Does this look correct?
backend is cudnn, nDockeys=32 as there's 32 cpu cores, others use the default parameter.
Thanks