Closed GBJim closed 7 years ago
I'm also having similar issues too. RFCN-ResNet-101 and SSD-Inception-v2 seem to run much slower than the benchmarked time. I'm using Pascal TitanXp with CUDA8.0
I am having this problem as well.
Thanks all - I'll look into this. If you can provide logs and system information, that would be helpful too.
@GBJim How can I test my own data, I have train my own data succeed, I just wonder how can I test my data and get a txt file which contain [ image_name, score,x,y,w,h], can you give me some suggestion? Thanks
@YanLiang0813 Please refer to this Python detection script You can use this function in your object_detection_tutorial.ipynb
Some loops could be optimized by numpy operation for efficiency.
@GBJim Thank you very much! It really help me a lot
Hi @jch1 Are there any updates about this issue? Please let me know if you need any extra information. Thanks!
I am having a similar issue on windows 10 with pip installed tensorflow 1.2.1. On 4-core CPU I get 400ms per frame on SSD-Mobilenet-V1, but on a gpu (960 gtx) it drops just to 360 ms, which is strange.
I am having a similar issue on Ubuntu 16.04 with pip installed tensorflow-gpu 1.2.1. I have a GTX 760 with 2G memory. It took ssd_mobilenet around 1600ms per frame. For faster_rcnn_inception, it took about 4000ms per frame.
Wow, how do you guys run that fast? I get at best 4-5 seconds per image on my 4-core i5 CPU with Windows7 and Tensorflow 1.1.0. Is there anything I can optimize to get near real-time behavior? Or is it slow because I am running out of the jupyter notebook?
Ok, sorry, my bad. I was including the Tensorflow double "with" lines in my loop. Taking those lines out I now get a more reasonable frame rate (<1sec). Still, is there any way to optimize that, even at a cost of accuracy? Is it worth retraining the model with only the 2-3 classes I am interested in detecting?
HI all: My issue has been solved. In my experiment, the detection time is measured end to end. The image IO time is included as well. I found that the network forward time of each model is actually normal, the bottleneck is the image IO .
In object_detection_tutorial.ipynb, image IO is implemented in PIL library as the following:
image = Image.open(image_path)
image_np = load_image_into_numpy_array(image)
image_np_expanded = np.expand_dims(image_np, axis=0)
This implementation is actually quite slow in my environment. Processing speed improved drastically after replacing this script with the following cv2 functions:
img = cv2.imread(img_path)
img = cv2.cvtColor(img, cv2.COLOR_RGB2BGR)
img_np_expanded = np.expand_dims(img, axis=0)
The average processing speed of images of 1280*720 in my environment is:
IO Method | Time per Image (1280*720) |
---|---|
PIL | 0.49 sec |
CV2 | 0.004 sec |
As for the detection speed of each model, this is my testing result:
Model name | FPS(1280*720) |
---|---|
ssd_mobilenet_v1_coco | 15.55 |
ssd_inception_v2_coco | 14.07 |
rfcn_resnet101_coco | 4.93 |
faster_rcnn_resnet101_coco | 3.79 |
faster_rcnn_inception_resnet_v2_atrous_coco | 0.97 |
great,thanks
@GBJim what's your network forward time of ssd_mobilenet_v1_coco? My result is about 70ms, which is much lower than SSD.
Hi @bailvwangzi Check out my speed testing table, I got 15.55 FPS, which is quite close to your number.
@GBJim I know the 15.55 FPS is fastest in your test, however, SSD+VGGNet can get 40 FPS, I want to know why SSD + mobileNet is slower than SSD in TF ?
@bailvwangzi: Wow! I didn't know that. Which implementation do you use?
@GBJim Just the Original SSD, https://github.com/balancap/SSD-Tensorflow. I think SSD+mobileNet can be faster than Original SSD, and #1771 has answered that model should be fast enough to detect objects in real time ,but the test result is not true now. As well as the RFCN ,4.93 FPS is much slower than https://github.com/xdever/RFCN-tensorflow .
I tested https://github.com/balancap/SSD-Tensorflow with gtx970. the result disappointed me. 0.7--0.8s for gpu inference, 3--4s for cpu inference. the original caffe code works on more than 20FPS. tensorflow is this much worse than caffe? or Did I have a mistake?
@ffrige I have same problem about run SSD on TF so slow ,my enviroment is same with you and can you tell me how to change code make it run faster ? I also find two 'with' in one loop in demo code ,but I don't know how to change it. can you help me plz? thank u very much.
@xfause Sorry for the late reply! All I meant with my previous post is that there is no need re-call the tf graph and session for every frame of your stream. It is enough to do it once at the beginning and then loop through the frames.
Here is how I do it with openCV:
cap = cv2.VideoCapture(0)
assert cap.isOpened(),"Camera not found!"
with detection_graph.as_default():
with tf.Session(graph=detection_graph) as sess:
while(True):
_, img = cap.read()
#put object detection here...
I am getting 75 FPS with Two GPUS 1080Ti. I expected much more performance than this. Tiny-yolo on single GPU is giving us 200 FPS. Can anyone confirm having the same observations that i have ? Thanks
HI All
I tested using SSD-Mobile and got like double speed of SSD_vgg16 I am trying to test R_FCNN but I can't figure out the value for the feature_extractor { type: ‘ssd_mobilenet_v1’ ......} what type should I use if am using RFCNN? Thanks
@GBJim can you confirm whether you have similar information/intuition on CPU times? I know gpu is often 10x faster, but i'm seeing CPU times like 100X slower than your table.
@GBJim Do you know why in the load_image_into_numpy_array
function, it has to reshape the image to (height, width) instead of keeping the original size (width, height)?
@GBJim,@lovekesh-thakur,@bailvwangzi, Hi , i have trained ssd_mobile_coco on my own dataset , and i have some questions : Q1: i want to measurement of speed , Q2 : i want to measurement of mAP . Q3 : is it possible to view mAP for validation(Test) when i running training ? HOW? Q3 : how to be converted (ms) to FPS ? OS : ubuntu 16.04 , GPU: GTX 1080 , Tensorflow : binary 1.5 , Cuda : 9 , Cudnn : 7
@GBJim
I met the same problem and found PIL function "np.array(image.getdata()).reshape(
(im_height, im_width, 3)).astype(np.uint8)" cost a lot of time. OpenCV is faster because the format read in is ndarray. Using this function "np.asarray(
Stop waiting your time on this trash. Move to Yolo.
Hi:
I've been testing the processing time of object detection models with GTX 1080. The result is abnormal to me:
I tested SSD-Mobilenet-V1 and RFCN-ResNet-101 on a hundred of 1280*720 images. The average processing time per image for the models are:
SSD-Mobilenet-V1: 0.38 sec **RFCN-ResNet-101***: 0.27 sec
This result is confusing to me. SSD-Mobilenet should be much faster based on my understanding. Any suggestion?
System information
What is the top-level directory of the model you are using: object_detection
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 16.04 --> nvidia-docker container
TensorFlow installed from (source or binary): Docker image from the official tensor flow image
TensorFlow version (use command below): 1.2.0
CUDA/cuDNN version: CUDA8.0 / cuDNN unknown since TF 1.2 does not reveal cuDNN usage
GPU model and memory: GTX 1080 8G