sfzhang15 / RefineDet

Single-Shot Refinement Neural Network for Object Detection, CVPR, 2018
Other
1.43k stars 392 forks source link

Cannot achieve high inference speed #123

Closed pnfatnani closed 6 years ago

pnfatnani commented 6 years ago

Hi, I tried training RefineDet512 using my custom dataset, but I can achieve only ~10 FPS on a GTX 1080Ti. Would it be possible for you to suggest any optimizations that can speed up the processing time.?

By the way, I have calculated inference time by placing the caffe.forward call (in test/refindet_demo.py) between the start_time = time.time() and end_time = time.time().

pnfatnani commented 6 years ago

@sfzhang15, just to add, I have not run the finetune script, is it ok?

sfzhang15 commented 6 years ago

@pnfatnani We measure the speed by caffe time.

pnfatnani commented 6 years ago

@sfzhang15 I tried caffe time, but I get the following error: F1010 11:50:24.706682 32291 insert_splits.cpp:29] Unknown bottom blob 'data' (layer 'conv1_1', bottom index 0)

Command I used: ~/Downloads/refine_det/build/tools/caffe time -model ~/Downloads/refine_det/models/VGGNet/refinedet_vgg16_512x512/test.prototxt -gpu 0

My first two layers are: name: "refinedet_vgg16_512x512_test" layer { name: "data" type: "AnnotatedData" top: "data" top: "label" include { phase: TEST } transform_param { mean_value: 104.0 mean_value: 117.0 mean_value: 123.0 resize_param { prob: 1.0 resize_mode: WARP height: 512 width: 512 interp_mode: LINEAR } } data_param { source: "/home/administrator/Downloads/caffe-ssd/data/VOCdevkit/TL/lmdb/TL_test_lmdb" batch_size: 1 backend: LMDB } annotated_data_param { batch_sampler { } label_map_file: "/home/administrator/Downloads/caffe-ssd/data/VOCdevkit/TL/labelmap_tl.prototxt" } } layer { name: "conv1_1" type: "Convolution" bottom: "data" top: "conv1_1" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } convolution_param { num_output: 64 pad: 1 kernel_size: 3 weight_filler { type: "xavier" } bias_filler { type: "constant" value: 0.0 } } }

sfzhang15 commented 6 years ago

@pnfatnani You should use deploy.prototxt, not test.prototxt.

pnfatnani commented 6 years ago

@sfzhang15 Thanks a lot ! Actually, the method which I used was also correct. The low FPS that I saw was because of a training running in parallel with inference in my single GPU machine. I also achieved the same numbers as posted by you as well as caffe_time. Thanks again and sorry for the trouble.