ijkguo / mx-rcnn

Parallel Faster R-CNN implementation with MXNet.
Other
669 stars 292 forks source link

Training with multiple-GPU is not faster #73

Closed zdwong closed 6 years ago

zdwong commented 7 years ago

Thanks for your great job transferring py-faster-rcnn in caffe into mxnet. When installing and running the mx-rcnn. I found that two GPU training can't have nearly two times faster one GPU training. platform: ubuntu 16.04, GPU: Tesla M60, 8G

bash script/vgg_voc07.sh 0

INFO:root:Epoch[0] Batch [20] Speed: 2.04 samples/sec Train-RPNAcc=0.894159, RPNLogLoss=0.361955, RPNL1Loss=1.139758, RCNNAcc=0.712054, RCNNLogLoss=1.508607, RCNNL1Loss=2.551116, INFO:root:Epoch[0] Batch [40] Speed: 1.89 samples/sec Train-RPNAcc=0.927401, RPNLogLoss=0.283141, RPNL1Loss=1.018088, RCNNAcc=0.743521, RCNNLogLoss=1.378231, RCNNL1Loss=2.585749, INFO:root:Epoch[0] Batch [60] Speed: 1.99 samples/sec Train-RPNAcc=0.941726, RPNLogLoss=0.229789, RPNL1Loss=0.936680, RCNNAcc=0.758965, RCNNLogLoss=1.284314, RCNNL1Loss=2.618034, INFO:root:Epoch[0] Batch [80] Speed: 2.08 samples/sec Train-RPNAcc=0.945939, RPNLogLoss=0.203962, RPNL1Loss=0.934596, RCNNAcc=0.763503, RCNNLogLoss=1.227046, RCNNL1Loss=2.619250, INFO:root:Epoch[0] Batch [100] Speed: 1.89 samples/sec Train-RPNAcc=0.942644, RPNLogLoss=0.211725, RPNL1Loss=0.920782, RCNNAcc=0.769183, RCNNLogLoss=1.197012, RCNNL1Loss=2.589773,

bash script/vgg_voc07.sh 0,1 INFO:root:Epoch[0] Batch [40] Speed: 2.10 samples/sec Train-RPNAcc=0.934642, RPNLogLoss=0.237217, RPNL1Loss=1.014563, RCNNAcc=0.766673, RCNNLogLoss=1.192775, RCNNL1Loss=2.580673, INFO:root:Epoch[0] Batch [60] Speed: 2.15 samples/sec Train-RPNAcc=0.942495, RPNLogLoss=0.202506, RPNL1Loss=0.930434, RCNNAcc=0.777600, RCNNLogLoss=1.104864, RCNNL1Loss=2.590131, INFO:root:Epoch[0] Batch [80] Speed: 2.26 samples/sec Train-RPNAcc=0.948712, RPNLogLoss=0.180862, RPNL1Loss=0.889647, RCNNAcc=0.792101, RCNNLogLoss=1.011266, RCNNL1Loss=2.562042, INFO:root:Epoch[0] Batch [100] Speed: 2.17 samples/sec Train-RPNAcc=0.955039, RPNLogLoss=0.160886, RPNL1Loss=0.852715, RCNNAcc=0.793162, RCNNLogLoss=0.972027, RCNNL1Loss=2.572651

I wonder that this problem causes by data parallelization, but I found that you said this version has implemented it . So how this problem happen? Thanks for your replay.

zdwong commented 6 years ago

I check it carefully, I make sure that generally multi-GPU training is faster than one-GPU training depending on hardware and platform.

315386775 commented 6 years ago

i also notice this question. But in readme: 3.8 img/s to 6 img/s for 2 GPUs

ijkguo commented 6 years ago

Most of the time the bottleneck is custom layer proposal_target or data loading. Check dmlc/gluon-cv for a gluon implementation.