tensorflow / models

Models and examples built with TensorFlow
Other
77.21k stars 45.75k forks source link

SSD-Moiblenet is slower than RFCN on the object_detection_tutorial.ipynb #1715

Closed GBJim closed 7 years ago

GBJim commented 7 years ago

Hi:

I've been testing the processing time of object detection models with GTX 1080. The result is abnormal to me:

I tested SSD-Mobilenet-V1 and RFCN-ResNet-101 on a hundred of 1280*720 images. The average processing time per image for the models are:

SSD-Mobilenet-V1: 0.38 sec **RFCN-ResNet-101***: 0.27 sec

This result is confusing to me. SSD-Mobilenet should be much faster based on my understanding. Any suggestion?

System information

sungsulim commented 7 years ago

I'm also having similar issues too. RFCN-ResNet-101 and SSD-Inception-v2 seem to run much slower than the benchmarked time. I'm using Pascal TitanXp with CUDA8.0

ghost commented 7 years ago

I am having this problem as well.

jch1 commented 7 years ago

Thanks all - I'll look into this. If you can provide logs and system information, that would be helpful too.

YanLiang0813 commented 7 years ago

@GBJim How can I test my own data, I have train my own data succeed, I just wonder how can I test my data and get a txt file which contain [ image_name, score,x,y,w,h], can you give me some suggestion? Thanks

GBJim commented 7 years ago

@YanLiang0813 Please refer to this Python detection script You can use this function in your object_detection_tutorial.ipynb

Some loops could be optimized by numpy operation for efficiency.

YanLiang0813 commented 7 years ago

@GBJim Thank you very much! It really help me a lot

GBJim commented 7 years ago

Hi @jch1 Are there any updates about this issue? Please let me know if you need any extra information. Thanks!

yurymalkov commented 7 years ago

I am having a similar issue on windows 10 with pip installed tensorflow 1.2.1. On 4-core CPU I get 400ms per frame on SSD-Mobilenet-V1, but on a gpu (960 gtx) it drops just to 360 ms, which is strange.

dcsds1 commented 7 years ago

I am having a similar issue on Ubuntu 16.04 with pip installed tensorflow-gpu 1.2.1. I have a GTX 760 with 2G memory. It took ssd_mobilenet around 1600ms per frame. For faster_rcnn_inception, it took about 4000ms per frame.

ffrige commented 7 years ago

Wow, how do you guys run that fast? I get at best 4-5 seconds per image on my 4-core i5 CPU with Windows7 and Tensorflow 1.1.0. Is there anything I can optimize to get near real-time behavior? Or is it slow because I am running out of the jupyter notebook?

ffrige commented 7 years ago

Ok, sorry, my bad. I was including the Tensorflow double "with" lines in my loop. Taking those lines out I now get a more reasonable frame rate (<1sec). Still, is there any way to optimize that, even at a cost of accuracy? Is it worth retraining the model with only the 2-3 classes I am interested in detecting?

GBJim commented 7 years ago

HI all: My issue has been solved. In my experiment, the detection time is measured end to end. The image IO time is included as well. I found that the network forward time of each model is actually normal, the bottleneck is the image IO .

In object_detection_tutorial.ipynb, image IO is implemented in PIL library as the following:

image = Image.open(image_path)
image_np = load_image_into_numpy_array(image)
image_np_expanded = np.expand_dims(image_np, axis=0)

This implementation is actually quite slow in my environment. Processing speed improved drastically after replacing this script with the following cv2 functions:

img = cv2.imread(img_path)
img = cv2.cvtColor(img, cv2.COLOR_RGB2BGR)    
img_np_expanded = np.expand_dims(img, axis=0)

The average processing speed of images of 1280*720 in my environment is:

IO Method Time per Image (1280*720)
PIL 0.49 sec
CV2 0.004 sec

As for the detection speed of each model, this is my testing result:

Model name FPS(1280*720)
ssd_mobilenet_v1_coco 15.55
ssd_inception_v2_coco 14.07
rfcn_resnet101_coco 4.93
faster_rcnn_resnet101_coco 3.79
faster_rcnn_inception_resnet_v2_atrous_coco 0.97
ghost commented 7 years ago

great,thanks

bailvwangzi commented 7 years ago

@GBJim what's your network forward time of ssd_mobilenet_v1_coco? My result is about 70ms, which is much lower than SSD.

GBJim commented 7 years ago

Hi @bailvwangzi Check out my speed testing table, I got 15.55 FPS, which is quite close to your number.

bailvwangzi commented 7 years ago

@GBJim I know the 15.55 FPS is fastest in your test, however, SSD+VGGNet can get 40 FPS, I want to know why SSD + mobileNet is slower than SSD in TF ?

GBJim commented 7 years ago

@bailvwangzi: Wow! I didn't know that. Which implementation do you use?

bailvwangzi commented 7 years ago

@GBJim Just the Original SSD, https://github.com/balancap/SSD-Tensorflow. I think SSD+mobileNet can be faster than Original SSD, and #1771 has answered that model should be fast enough to detect objects in real time ,but the test result is not true now. As well as the RFCN ,4.93 FPS is much slower than https://github.com/xdever/RFCN-tensorflow .

jtn-ms commented 7 years ago

I tested https://github.com/balancap/SSD-Tensorflow with gtx970. the result disappointed me. 0.7--0.8s for gpu inference, 3--4s for cpu inference. the original caffe code works on more than 20FPS. tensorflow is this much worse than caffe? or Did I have a mistake?

xfause commented 7 years ago

@ffrige I have same problem about run SSD on TF so slow ,my enviroment is same with you and can you tell me how to change code make it run faster ? I also find two 'with' in one loop in demo code ,but I don't know how to change it. can you help me plz? thank u very much.

ffrige commented 7 years ago

@xfause Sorry for the late reply! All I meant with my previous post is that there is no need re-call the tf graph and session for every frame of your stream. It is enough to do it once at the beginning and then loop through the frames.

Here is how I do it with openCV:

cap = cv2.VideoCapture(0)
assert cap.isOpened(),"Camera not found!"

with detection_graph.as_default():
    with tf.Session(graph=detection_graph) as sess:
        while(True):
            _, img = cap.read()
            #put object detection here...
lovekesh-thakur commented 7 years ago

I am getting 75 FPS with Two GPUS 1080Ti. I expected much more performance than this. Tiny-yolo on single GPU is giving us 200 FPS. Can anyone confirm having the same observations that i have ? Thanks

Walid-Ahmed commented 7 years ago

HI All

I tested using SSD-Mobile and got like double speed of SSD_vgg16 I am trying to test R_FCNN but I can't figure out the value for the feature_extractor { type: ‘ssd_mobilenet_v1’ ......} what type should I use if am using RFCNN? Thanks

bw4sz commented 7 years ago

@GBJim can you confirm whether you have similar information/intuition on CPU times? I know gpu is often 10x faster, but i'm seeing CPU times like 100X slower than your table.

https://stackoverflow.com/questions/46839073/tensorflow-object-detection-api-rcnn-is-extremely-slow-on-cpu-1-frame-per-min

khanh96le commented 6 years ago

@GBJim Do you know why in the load_image_into_numpy_array function, it has to reshape the image to (height, width) instead of keeping the original size (width, height)?

PythonImageDeveloper commented 6 years ago

@GBJim,@lovekesh-thakur,@bailvwangzi, Hi , i have trained ssd_mobile_coco on my own dataset , and i have some questions : Q1: i want to measurement of speed , Q2 : i want to measurement of mAP . Q3 : is it possible to view mAP for validation(Test) when i running training ? HOW? Q3 : how to be converted (ms) to FPS ? OS : ubuntu 16.04 , GPU: GTX 1080 , Tensorflow : binary 1.5 , Cuda : 9 , Cudnn : 7

bidai541 commented 6 years ago

@GBJim I met the same problem and found PIL function "np.array(image.getdata()).reshape( (im_height, im_width, 3)).astype(np.uint8)" cost a lot of time. OpenCV is faster because the format read in is ndarray. Using this function "np.asarray()" instead to aviod reshape operation and then will be no difference in efficiency.

dexception commented 5 years ago

Stop waiting your time on this trash. Move to Yolo.