Closed OnlySang closed 7 years ago
https://github.com/precedenceguo/mx-rcnn/blob/master/rcnn/symbol/proposal.py
Code for infer_shape
does not output such error message. Please check again.
@precedenceguo thx. You have made a update.
Yes, updated before this issue.
@precedenceguo But I tried ur mxnet version and RBG's caffe version under the same hyper parameters on the same dataset, caffe's version is nearly 3 times faster than mxnet version. I'm a newer of mxnet, is there something to optimize ur code?
Maybe true. You did not use the same hyper parameters. Some of them is not included here :). Why don't you elaborate the speed comparison and see if we can make it faster while being able to parallelize (caffe cannot :))?
@precedenceguo I tried them one batch one image. I noticed that ur nms do not use gpu. I will add nms_gpu. And what do u mean by some parameter not include here? I changed the rcnn/config.py and anchor setup in related functions.
gpu_nms is faster than python nms. Looking forward to a comparison with gpu_nms on both sides.
@precedenceguo I add gpu_nms, when using only one gpu, the speed is almost the same, but GPU load is about 30% more than caffe version. Two gpu is useless to speed.
thanks for this information
Taking another look, two gpu is useful in training phase (1.5x-1.8x). So did you mean to speed up the testing phase before?
@precedenceguo Because I am new to mxnet, I know little about how mxnet parallization works. Can u give some clues to do more update? Training under one gpu and two gpu, the speed is almost the same.
@precedenceguo multi-gpu is more like sequence run. The speed of multi-gpu is the same with single gpu.
I noticed u use class module to do multi-gpu. But the code in DataParallelExecutorGroup is:
for exec_ in self.execs: exec_.forward(is_train=is_train)
I think this is not parallelization.
Try alternate training. There is speedup. As to your question, try example/image-classification/train_cifar10.py. It also uses DataParallelExecutorGroup to execute.
With gpu_nms, I observed 1.4x speed up with VGG on 2 gpus. DataParallelExecutorGroup may seems like sequence run but rather it is the way dependency engine works. The cost of synchronizing VGG is great so stay tuned for resnet.
when I command like python trian_end2end.py --gpu 0,1, error occurs like: ('Error in proposal.infer_shape: ', 'Only single item batches are supported') I think it shuold support multi-gpu, what should I do?