Open dhinkris opened 4 years ago
Hi, I used TFTRT to optimize the 3D UNET segmentation. I couldn't find any significant improvement in the speed. On a P100 GPU, for a batch size of 4 averaged for 20 runs: i) keras model = 0.78s ii) TFTRT-FP32 = 0.74s iii) TFTRT-FP16 = 0.74s
Can somebody help if this can be optimized? Thank you
Hi @dhinkris ,
Can I ask how you converted the TensorFlow model to TensorRT engine?
I encountered problems with Conv3D operator so I have to build the network in Pytorch and then export to ONNX format.
Then I have another problem using TensorRT carried onnx parser, which complained paddings having size == 8.
Thanks. Zheng
@dhinkris Could you run TF-TRT with verbose logging and attach the log here.
What version of TF and TRT did you use?
@pooyadavoodi below are the logs for FP32 2019-12-24 10:55:50.344930: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:716] Optimization results for grappler item: tf_graph 2019-12-24 10:55:50.344979: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:718] constant folding: Graph size after: 298 nodes (-126), 326 edges (-140), time = 2886.54199ms. 2019-12-24 10:55:50.344985: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:718] layout: Graph size after: 298 nodes (0), 326 edges (0), time = 698.473ms. 2019-12-24 10:55:50.344990: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:718] constant folding: Graph size after: 298 nodes (0), 326 edges (0), time = 1178.32605ms.
for FP16 2019-12-24 10:56:25.241831: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:716] Optimization results for grappler item: tf_graph 2019-12-24 10:56:25.241877: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:718] constant folding: Graph size after: 298 nodes (-126), 326 edges (-140), time = 2877.59204ms. 2019-12-24 10:56:25.241883: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:718] layout: Graph size after: 298 nodes (0), 326 edges (0), time = 724.761ms. 2019-12-24 10:56:25.241889: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:718] constant folding: Graph size after: 298 nodes (0), 326 edges (0), time = 1268.29602ms.
I am using Tensorflow 1.14 and not sure what version of TRT it uses.
Thank you.
@zhen-xing I used keras models and converted it to tensorflow model. You can take a look at the below function
class FrozenGraph(object):
def __init__(self, model, shape):
shape = (None, shape[0], shape[1], shape[2], shape[3])
x_name = 'image_tensor_x'
with K.get_session() as sess:
x_tensor = tf.placeholder(tf.float32, shape, x_name)
K.set_learning_phase(0)
y_tensor = model(x_tensor)
y_name = y_tensor.name[:-2]
graph = sess.graph.as_graph_def()
graph0 = tf.graph_util.convert_variables_to_constants(sess, graph, [y_name])
graph1 = tf.graph_util.remove_training_nodes(graph0)
self.x_name = [x_name]
self.y_name = [y_name]
self.frozen = graph1
model = load_model(modelname)
frozen_graph = FrozenGraph(model, (shape))
tf_engine = TfEngine(frozen_graph)
2019-12-24 10:55:50.344930: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:716] Optimization results for grappler item: tf_graph 2019-12-24 10:55:50.344979: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:718] constant folding: Graph size after: 298 nodes (-126), 326 edges (-140), time = 2886.54199ms. 2019-12-24 10:55:50.344985: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:718] layout: Graph size after: 298 nodes (0), 326 edges (0), time = 698.473ms. 2019-12-24 10:55:50.344990: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:718] constant folding: Graph size after: 298 nodes (0), 326 edges (0), time = 1178.32605ms.
Could you attach the full log. This one doesn't have the information I am looking for. Here is how to get verbose logging: https://docs.nvidia.com/deeplearning/frameworks/tf-trt-user-guide/index.html#verbose
Hi @pooyadavoodi , I am getting the same log if I use these flags: TF_CPP_VMODULE=segment=2,convert_graph=2,convert_nodes=2,trt_engine=1,trt_logger=2
But if I use this flag, I got lot for 4gb. TF_CPP_MIN_VLOG_LEVEL=2 python
I have uploaded that file here. https://documentcloud.adobe.com/link/track?uri=urn%3Aaaid%3Ascds%3AUS%3Ad03518c5-cc26-4685-9c4c-6b32713b4b48
Please let me know if this is helpful. Thank you.
Hi, I used TFTRT to optimize the 3D UNET segmentation. I couldn't find any significant improvement in the speed. On a P100 GPU, for a batch size of 4 averaged for 20 runs: i) keras model = 0.78s ii) TFTRT-FP32 = 0.74s iii) TFTRT-FP16 = 0.74s
Can somebody help if this can be optimized? Thank you