weiliu89 / caffe

Caffe: a fast open framework for deep learning.
http://caffe.berkeleyvision.org/
Other
4.77k stars 1.67k forks source link

multi-gpus do not accelerate training #476

Open 1292765944 opened 7 years ago

1292765944 commented 7 years ago

In my experiment, this version of caffe do not support multi-gpu training. The training time of two gpu(16 batchsize per gpu) does not reduce training time half on one gpu(32 batchsize per gpu). Does anyone encounter this problem?

weiliu89 commented 7 years ago

It might be that the preprocess part is slow. The multi-gpu is same as Caffe's previous one (no NCCL).

dtmoodie commented 7 years ago

Hello,

I believe one of the reasons for the slowdown with the most recent release is because some of the pre-processing code encodes and decodes images multiple times. I've modified the code so that once an image is decoded, it stays decoded. This has resulted in approximately a 2x speed up. Unfortunately I stupidly was working on a different branch when I found and fixed this, but the commits can be found here: https://github.com/dtmoodie/caffe/tree/sanghoon-dev_pvanet 5d34a32d15423d73490e103eed4eff7d8c8399da 5d34a32d15423d73490e103eed4eff7d8c8399da 5d34a32d15423d73490e103eed4eff7d8c8399da 5d34a32d15423d73490e103eed4eff7d8c8399da

Furthermore, this branch is a merge of nvidia/caffe which includes better multi gpu scaling: https://github.com/dtmoodie/caffe/tree/test_ssd_merge

With the https://github.com/dtmoodie/caffe/tree/sanghoon-dev_pvanet branch I can achieve ~ 50% gpu load on an 8 titan X pascal machine with a batch size of 8 images per gpu. I can do about 1.5 - 3 iterations per second which yields about 160 frames per second in training.