No speed improvements after TF-TRT optimizing

EsmeYi commented 5 years ago

A small tip which may be useful

FP16 or INT8 does improve the inference speed, but not all hardwares support such precision modes. NVIDIA hardware and which precision modes each hardware supports: https://docs.nvidia.com/deeplearning/sdk/tensorrt-support-matrix/index.html#hardware-precision-matrix

System information

OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux RHEL 7.6
CPU architecture: power8, ppc64le
TensorFlow version (use command below):1.13.1
Python version: Python 2.7.16 :: Anaconda, Inc.
CUDA/cuDNN version: CUDA 10.1, cuDNN 7.6
GPU model and memory: 2 Tesla V100

Result

Dataset: 4952 amount, 300*300 size

Model	Model size	Num Nodes	Batch size	mAP	Latency (ms)	img/sec
FasterRCNN	69M	6931	64	0.7021	342316	15.28
FasterRCNN (TF-TRT)	53M	6456	64	0.7019	334819	15.17
MaskRCNN	78M	7096	32	0.6977	426658	11.67
MaskRCNN (TF-TRT)	53M	6622	32	0.6974	406786	12.17

Programmerwyl commented 5 years ago

Generate the optimized graph on PC and run the graph on tx2 trt new mobilenet_v2 trt 13.124145984649658 mobilenet_v2 trt 0.02857375144958496 mobilenet_v2 trt 0.025561094284057617 mobilenet_v2 trt 0.024762630462646484 mobilenet_v2 trt 0.024967432022094727 mobilenet_v2 trt 0.025426864624023438 mobilenet_v2 trt 0.027881860733032227 mobilenet_v2 trt 0.022449254989624023 mobilenet_v2 trt 0.02154541015625 mobilenet_v2 trt 0.021519184112548828 average(sec):0.034324301613701716,fps:29.1338775440901

Generate the optimized graph on TX2 and run the graph on tx2

mobilenet_v2 trt 4.066771030426025 mobilenet_v2 trt 0.01324772834777832 mobilenet_v2 trt 0.010189056396484375 mobilenet_v2 trt 0.011507987976074219 mobilenet_v2 trt 0.012037277221679688 mobilenet_v2 trt 0.009507417678833008 mobilenet_v2 trt 0.01143336296081543 mobilenet_v2 trt 0.010309219360351562 mobilenet_v2 trt 0.01093292236328125 mobilenet_v2 trt 0.012867927551269531 average(sec):0.01874608463711209,fps:53.344472691661444

Programmerwyl commented 5 years ago

But there is still a problem when Generate the optimized graph on TX2 and run the graph on tx2


 with tf.gfile.FastGFile(graph_path, "rb") as f:
        graph_def.ParseFromString(f.read())
        tf.import_graph_def(graph_def, name='')
it is too slow. it cost 290.40938997268677seconds 
especially this code '  tf.import_graph_def(graph_def, name='')'

Mythos-Rudy commented 5 years ago

@Eloring Hello, I tried your method with tensorflow 1.14, It's useful, thank you very much! But, I am really confused that why trt_model is bigger than original_model? Especially I covert the model from FP32 to FP16 or INT8 graph_size(MB)(**native_tf**): **27.4** graph_size(MB)(**trt**): **53.2** num_nodes(native_tf): 6127 num_nodes(tftrt_total): 2894 num_nodes(trt_only): 3

Programmerwyl commented 5 years ago

@Mythos-Rudy
Hi ,How do you install tensorflow 1.14? Where can I get the installation package? Can I get a website？ Thanks

EsmeYi commented 5 years ago

@Programmerwyl I met the same problem, i.e. tf.import_graph_def is too slow, which is because importing a *.pb model will call ParseFromString() and this function is provided by the protobuf. I solved it by compiling a cpp-implemented protobuf from source, as I have recorded here. Hope this is helpful.

EsmeYi commented 5 years ago

@anuar12 I am not sure whether 2080 Ti supports FP16, https://docs.nvidia.com/deeplearning/sdk/tensorrt-support-matrix/index.html#hardware-precision-matrix

Mythos-Rudy commented 5 years ago

@Programmerwyl Sorry, I don't know what you mean. I installed TensorFlow in Linux env, so I just pip install tensorflow==1.14.0

Programmerwyl commented 5 years ago

@Eloring Thank you very much for your reply. I have solved the problem of slow loading of graph. Thanks again for your solution

Programmerwyl commented 5 years ago

@Mythos-Rudy I installed tensorRT through conda, and then sudo pip3 install tensorflow-gpu==1.14.0 and it does not work The development environment is ubuntu 18.04 The computer's graphics card is 1060 the tftrt_total is zero

anuar12 commented 5 years ago

@Eloring Yes 2080 Ti has to support FP16 because it is 7.5 compute capability with tensor cores.

hudengjunai commented 5 years ago

I soved the problem by using nvidia-docker with the tensorflow19.04 container.Refereing the Tensorflow-TensorRt-user-guid ,I find that there are only two ways to install TF-TRT:using container or compiling TensorFlow with TensorRT integration from its source.

I have compile the tensorflow with tensorRT ,but there is also no speedup ,Did I compile wrong?how did you compile the Tensorflow with TensorRT ,which tf and trt version? could you please give me some tips?

Ekta246 commented 4 years ago

I soved the problem by using nvidia-docker with the tensorflow19.04 container.Refereing the Tensorflow-TensorRt-user-guid ,I find that there are only two ways to install TF-TRT:using container or compiling TensorFlow with TensorRT integration from its source.

are you sure there is no binary support for the TF-trt integration. I guess the Tf-trt Github gives about Binary installation too. How about using tensorflow.python.compiler library for binary installation method if you want to avoid the bulky bezel build configuration while building the TensorFlow from the source

tensorflow / tensorrt

No speed improvements after TF-TRT optimizing #89

A small tip which may be useful