tensorflow / tensorrt

TensorFlow/TensorRT integration
Apache License 2.0
736 stars 226 forks source link

No improvement using TensorRT5 #256

Open IwakuraRein opened 3 years ago

IwakuraRein commented 3 years ago

At first, I installed TensorRT7 but it gave me an error when looking for 'libnvinfer.so.5'. There is no 'libnvinfer.so.5' but a 'libnvinfer.so.7' instead. Then I installed TensorRT5 and followed the instructions here. This time I successfully created the optimized graph.

My codes:

    config = tf.ConfigProto(allow_soft_placement=True, graph_options=tf.GraphOptions(
        optimizer_options=tf.OptimizerOptions(opt_level=tf.OptimizerOptions.L0)))
    config.gpu_options.allow_growth = True

    # load saved model
    with tf.gfile.GFile(SAVE_PATH+'_classroom/model/best_model.pb', 'rb') as f:
        frozen_graph = tf.GraphDef()
        frozen_graph.ParseFromString(f.read())

    # create optimized graph
    trt_graph = trt.TrtGraphConverter(input_graph_def=frozen_graph, session_config=config,nodes_blacklist=return_elements_list,is_dynamic_op=True,precision_mode=precision,minimum_segment_size=segment).convert()

    sess = tf.Session(config=config)
    tf.import_graph_def(trt_graph,{'source':model.source},return_elements=return_elements_list)
    run_metadata = tf.RunMetadata()
    run_options = tf.RunOptions(trace_level=tf.RunOptions.FULL_TRACE)
    sess.run(tf.global_variables_initializer())
    while True:
        try:
            src_hdr_in, tgt_hdr_in = sess.run(next_element_large,
                feed_dict={handle_large: test_handle})
            src_hdr = np.zeros((test_batch_size, PADDING_HEIGHT, IMAGE_WIDTH, INPUT_CHANNEL))
            tgt_hdr = np.zeros((test_batch_size, PADDING_HEIGHT, IMAGE_WIDTH, TARGET_CHANNEL))            
            src_hdr[:,0:IMAGE_HEIGHT,:,:] = src_hdr_in
            tgt_hdr[:,0:IMAGE_HEIGHT,:,:] = tgt_hdr_in
            feed_dict = {model.source: src_hdr}
            output_tensor = sess.graph.get_tensor_by_name(output_tensor_name)
            denoised_1_bd = sess.run(output_tensor, feed_dict, options=run_options, run_metadata=run_metadata)
    # ...

I use Tensorflow's profiler to generate a timeline.json. There are multiple names I've never seen before in the timeline table, such as 'volta_scudnn_128x32_relu_small_nn_v1', so I think the profiler is describing the optimized graph correctly, not the vanilla one.

However, no improvement appears according to the timeline.json. The inference times are nearly the same. My network is purely CNN with a structure similar to U-Net. I supposed there would be an improvement nearly 2x.