TensorRT graph optimization for object detection model graphs

fuatka commented 6 years ago

System information

What is the top-level directory of the model you are using: research/tensorrt
Have I written custom code (as opposed to using a stock example script provided in TensorFlow): Yes
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 16.04
TensorFlow installed from (source or binary): binary
Bazel version (if compiling from source): N/A
TensorFlow version (use command below): 1.8
CUDA/cuDNN version: 9.0/7.0
GPU model and memory: GTX1050Ti 4GB
Exact command to reproduce: tensorrt.py

Describe the problem

I can successfully run create_inference_graph function for TensorFlow Slim image classification models https://github.com/tensorflow/models/tree/master/research/slim#pre-trained-models

But create_inference_graph gives error for TensorFlow object detection models (like ssd_mobilenet_v1, faster_rcnn_inception_v2) https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/detection_model_zoo.md#coco-trained-models-coco-models

Is TensorRT graph optimization functions supported for object detection model graphs?

Source code / logs

"/home/fuat/.local/lib/python3.5/site-packages/tensorflow/contrib/tensorrt/python/trt_convert.py", line 115, in create_inference_graph
    int(msg[0]))
tensorflow.python.framework.errors_impl.NotFoundError: No attr named 'index_type' in NodeDef:

tensorflowbutler commented 6 years ago

Thank you for your post. We noticed you have not filled out the following field in the issue template. Could you update them if they are relevant in your case, or leave them as N/A? Thanks. Bazel version

karmel commented 6 years ago

@fuatka , can you provide the full error trace and command you used to run the script?

CC @aaroey @samikama for thoughts on which models should be expected to work here.

samikama commented 6 years ago

@fuatka,

Object detection networks support is not in an advanced state yet and user needs to do some tricks. TensorRT4.0 and updates to TFTRT will improve things with upcoming releases.

fuatka commented 6 years ago

@karmel I modified run_trt_graph_for_mode function in tensorrt.py file and comment out time_and_log_graph function. Actually I don't know output_node name for ssd_mobilenetv1 graph. But with supported graphs and wrong output_node names, code doesn't give error and create 4byte output graph file :)

Full command and full error trace, fuat@fuat-DNN:~/tensorflow/models/research/tensorrt$ python3 tensorrt.py --frozen_graph=ssd_mobilenetv1_frozen_graph.pb --output_node=FeatureExtractor --image_file=image.jpg --fp16 --output_dir=output 2018-05-14 21:09:39.628300: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2018-05-14 21:09:39.748843: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:898] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2018-05-14 21:09:39.749170: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 0 with properties: name: GeForce GTX 1050 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.455 pciBusID: 0000:01:00.0 totalMemory: 3.94GiB freeMemory: 3.53GiB 2018-05-14 21:09:39.749186: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0 2018-05-14 21:09:39.986434: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix: 2018-05-14 21:09:39.986464: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929] 0 2018-05-14 21:09:39.986469: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0: N 2018-05-14 21:09:39.986603: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 2019 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1050 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1) Running FP16 graph 2018-05-14 21:09:40.162749: I tensorflow/core/grappler/devices.cc:51] Number of eligible GPUs (core count >= 8): 0 Traceback (most recent call last): File "tensorrt.py", line 615, in <module> main(argv=sys.argv) File "tensorrt.py", line 474, in main graph_name, frozen_graph_def, mode, data, log_buffer, flags) File "tensorrt.py", line 372, in run_trt_graph_for_mode flags.batch_size, flags.workspace_size) File "tensorrt.py", line 238, in get_trt_graph precision_mode=precision_mode) File "/home/fuat/.local/lib/python3.5/site-packages/tensorflow/contrib/tensorrt/python/trt_convert.py", line 115, in create_inference_graph int(msg[0])) tensorflow.python.framework.errors_impl.NotFoundError: No attr named 'index_type' in NodeDef: [[Node: Postprocessor/BatchMultiClassNonMaxSuppression/ones = Fill[T=DT_INT32](Postprocessor/BatchMultiClassNonMaxSuppression/ones/shape, Postprocessor/BatchMultiClassNonMaxSuppression/ones/Const)]] [[Node: Postprocessor/BatchMultiClassNonMaxSuppression/ones = Fill[T=DT_INT32](Postprocessor/BatchMultiClassNonMaxSuppression/ones/shape, Postprocessor/BatchMultiClassNonMaxSuppression/ones/Const)]]

fuatka commented 6 years ago

@samikama Thank you for the answer..

Can you explain little bit about these tricks :) I can see all node names in model graph.pbtxt file. But how can we choose right output_node name for the graphs?

lukashruby commented 6 years ago

Are there any updates yet?

samikama commented 6 years ago

Most common cause of failure is shape deduction. If you can get shapes fixed it should convert better. But even then most of the time is spent on CPU layers for NMS so you don't see much benefit. We are working on a new PR in TF which should remove shape restriction. But others will require changes TRT which will be available in coming months.

bmount commented 6 years ago

@samikama Thank you! Your suggestion does indeed work. A data point that might be helpful (and also to others on this thread) -- TensorRT graph optimization, ie converting this repo's object detection graphs to tensorflow models with 1 or more subgraphs executed in TensorRT -- works with older frozen models from the model zoo (I think those that predate variable batch size during inference.) My attempt to create these subgraphs from a more recent frozen model, one created by using research/object_detection/export_inference_graph.py at time of writing, was sort of passively preserving the input graph and failing to generate any TensorRT subgraphs. The absolute minimum diff for a workaround (which doesn't generalize to anything other than the exact input shape of the particular model) is:

diff --git a/research/object_detection/exporter.py b/research/object_detection/exporter.py
index 05b09b1..7d1b94d 100644
--- a/research/object_detection/exporter.py
+++ b/research/object_detection/exporter.py
@@ -126,7 +126,7 @@ def replace_variable_values_with_moving_averages(graph,
 def _image_tensor_input_placeholder(input_shape=None):
   """Returns input placeholder and a 4-D uint8 image tensor."""
   if input_shape is None:
-    input_shape = (None, None, None, 3)
+    input_shape = (4, 300, 300, 3)
   input_tensor = tf.placeholder(
       dtype=tf.uint8, shape=input_shape, name='image_tensor')
   return input_tensor, input_tensor

Is there a cleaner way to do that?

(Related to your final comment, if this can be of help, I've found it's true that NMS is an unexpectedly costly step but for many apps' latency: 1) setting relatively high application-specific confidence thresholds before grouping is helpful and 2) having class-oblivious NMS also helps and has upside by picking a winner between similar classes.)

samikama commented 6 years ago

When TF PR 19871 is merged, we will be constructing engines on the fly, at the expense of some compute time, so as long as your input shapes don't change(other than batch), you will be able to run. A further PR will remove non-batch rank restriction as well. If too many engines need to be constructed, this will negatively impact the performance.

rsandler00 commented 5 years ago

Hi @samikama is there any update on this?

I am receiving the same error with TensorRT 4.0.2.0-1+cuda9.0 and tensorflow-gpu==1.9.0+nv18.8.

I recieve the error for both the SSDLite graph from here and the MobileNetv2 graph from here.

aaroey commented 5 years ago

@rsandler00 would you try the latest TF v1.12? Since v1.9 many issues have been fixed.

rsandler00 commented 5 years ago

@aaroey as per the instructions here, I installed Tensorflow-gpu on my Jetson TX2 directly from NVIDIA using:

pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/jp33 tensorflow-gpu

This version comes with the tf.contrib.tensorrt module and it happens to be tensorflow v1.9. How would I get TF v1.12 with the tensorrt module. My understanding is that it does not come included by default?

aaroey commented 5 years ago

@rsandler00 I'm not sure why in NVIDIA's doc only v1.9 is included, @pooyadavoodi may know more. But to install TF 1.12 can you just run pip install tensorflow-gpu? For more installation options see https://www.tensorflow.org/install/.

rsandler00 commented 5 years ago

@aaroey thank you! Does the default install of TF include TF-TRT?

aaroey commented 5 years ago

@rsandler00 yes it does.

pooyadavoodi commented 5 years ago

We released the wheel for TX2 only once and that was when TF1.9 was out. We have newer wheels for Xavier which include more recent TF.

There are problems with building TF1.12 on aarch64. We have resolved most of the issues and will release a wheel for Xavier hopefully soon. We are also considering open souring the build scripts we use for Xavier. That should help the community,

rsandler00 commented 5 years ago

@pooyadavoodi thank you, this is exactly what I was looking for.

So to confirm: (1) As of now do not try to install TF1.12 on Jetson TX2. (2) TF1.9 cannot yet support converting MobileNet / SSDLite type architectures, and thus there is no way of converting them into TensorRT on Jetson TX2.

ymodak commented 5 years ago

Closing this issue since its resolved. Feel free to reopen if issue still persists. Thanks!

tensorflow / models