Closed vilmara closed 6 years ago
@vilmara , sorry, but I am having trouble following the issue above-- can you clarify for me what works and what doesn't in the above scenario? It will be difficult to help debug given that you are using a custom model; can you attempt to create an example that isolates the issue you are experiencing without relying an a separate, complex model?
Hi @karmel, I think the problem is the way the custom model was exported of frozen (it was exported by somebody else). The script tensorrt.py works fine with the frozen graph resnetv2_imagenet_frozen_graph.pb but it doesn't work with the model I am using (a pb file too) . Probably I will need to retrain and export the model by myself. Here I have some questions: 1- Could you please give some guidelines about the parameters/instructions I have to use when exporting the model so it can be converted to TensorRT4 format using the script tensorrt.py? 2- I think the frozen graph I am using was created using Keras. If so, does it need to be converted in some way to TensorFlow?. I received 2 files with the extentions .pbtxt and .pb; assuming that the one with the pb extention is the frozen graph file ready to be used with the script tensorrt.py
The primary determinant seems to be whether the ops are supported-- see https://docs.nvidia.com/deeplearning/sdk/tensorrt-developer-guide/index.html#support_op
It's hard to tell what you have from the information provided. Was it a tf.keras model? Is there a directory, or just the protobuff? What does the pbtxt contain? (That is typically the JSON encoded equivalent of the pb itself.)
hi @karmel, I decided to re-train the network, I am using transfer learning using pre-trained ResNet50 model with Keras, and a Dense layer attached at the end for the binary classification problem (14 labels), here is the code I using to define the network:
from keras.models import Model
from keras.applications.resnet50 import ResNet50
from keras.layers import GlobalAveragePooling2D, Dense
base_model = ResNet50(include_top=False, weights='imagenet', input_shape=(256, 256, 3))
x = base_model.output
x = GlobalAveragePooling2D()(x)
predictions = Dense(14, activation='sigmoid', bias_initializer='ones')(x)
model = Model(inputs=base_model.input, outputs=predictions)
model.compile(loss='binary_crossentropy', optimizer='sgd', metrics=['accuracy'])
Question 1: Do I need to configure and convert keras model to Estimator before freeze it?
Also, here is the code I am using to freeze the keras model:
def convert_keras_to_pb(keras_model, out_names, models_dir, model_filename):
model = load_model(keras_model)
K.set_learning_phase(0)
sess = K.get_session()
saver = saver_lib.Saver(write_version=saver_pb2.SaverDef.V2)
checkpoint_path = saver.save(sess, './saved_ckpt', global_step=0, latest_filename='checkpoint_state')
graph_io.write_graph(sess.graph, '.', 'tmp.pb')
freeze_graph.freeze_graph('./tmp.pb', '',
False, checkpoint_path, out_names,
"save/restore_all", "save/Const:0",
models_dir+model_filename, False, "")
Here is the file with the keras model's summary before it was frozen: keras_model_summary.txt
Even though I am re-trained the network, the system is throwing the below errors when converting the model to a TensorRT engine:
2018-09-19 22:24:01.766055: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:3161] Max batch size= 8 max workspace size= 31293022
2018-09-19 22:24:01.766072: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:3167] starting build engine
2018-09-19 22:24:01.870861: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:3172] Built network
2018-09-19 22:24:01.872366: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:3177] Serialized engine
2018-09-19 22:24:01.873043: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:3185] finished engine my_trt_op47 containing 8 nodes
2018-09-19 22:24:01.873085: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:3192] Finished op preparation
2018-09-19 22:24:01.873118: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:3200] OK finished op building
**2018-09-19 22:24:01.878332: W tensorflow/contrib/tensorrt/convert/convert_graph.cc:454] subgraph conversion error for subgraph_index:48 due to: "Unimplemented: Not supported constant type, at bn_conv1/Const_5" SKIPPING......( 9 nodes)
2018-09-19 22:24:01.883471: W tensorflow/contrib/tensorrt/convert/convert_graph.cc:454] subgraph conversion error for subgraph_index:49 due to: "Unimplemented: Not supported constant type, at bn4d_branch2a/Const_5" SKIPPING......( 11 nodes)**
hi @karmel, I solved the issue with the model implemented in TensorFlow (I couldn't solve the problem either with direct Keras API natively with TensorFlow backend or with Keras in the TensorFlow implementation).
After trained the model, I exported the inference graph as a SavedModel with the function export_savedmodel
. I read that to serve a model with tensorflow serving, we must export the trained model with export_savedmodel method.
Here are the links with more info if somebody faces the same issue:
TensorFlow Estimator https://www.tensorflow.org/api_docs/python/tf/estimator/Estimator#export_savedmodel
Serving Pre-Modeled and Custom Tensorflow Estimator with Tensorflow Serving https://medium.com/@yuu.ishikawa/serving-pre-modeled-and-custom-tensorflow-estimator-with-tensorflow-serving-12833b4be421
@vilmara I face the same issue when using my own model fast-rcnn. Could you tell me the detail of solving this issue
hi @cheneyoung, I implemented the model in TensorFlow directly and solved it as explained in my message above
System information
What is the top-level directory of the model you are using: models/tree/master/research/tensorrt Have I written custom code (as opposed to using a stock example script provided in TensorFlow): no OS Platform and Distribution (e.g., Linux Ubuntu 16.04): GNU/Linux 4.4.0-128-generic x86_64 TensorFlow installed from (source or binary): via docker image nvcr.io/nvidia/tensorflow:18.07-py3 TensorFlow version (use command below): 1.18 TensoRT version: 4.0.1. Bazel version (if compiling from source): n/a CUDA/cuDNN version: 9.0.176 / 7.1.4 GPU model and memory: Tesla V100-SXM2-16GB Exact command to reproduce: python3 tensorrt.py --frozen_graph=/workspace/native_resnet_chest_xray.pb --image_file=image.jpg --native --fp32 --fp16 --output_node=probs --input_node=IteratorGetNext --batch_size=8 --output_dir=/workspace/trt_outputs/
Describe the problem
I am running the example tensorrt.py via container image using my a protobuf frozen graph file but it throws the errors "subgraph conversion error for subgraph_index:1" when running FP32 graph and FP16 graph. It seems the resnet hasn't been converted.
The test worked fine when running native graph and without the flag --int8 (already reported with issue #5093)
Source code / logs
root@c636b093c2d3:/workspace/tensorflow_models/research/tensorrt# python3 tensorrt.py --frozen_graph=/workspace/resnet_xray_model_frozengraph/resnet_chest_xray.pb --image_f ile=image.jpg --native --fp32 --fp16 --output_node=probs --input_node=IteratorGetNext --batch_size=8 --output_dir=/workspace/trt_outputs/ 2018-08-17 16:12:46.508294: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 0 with properties: name: Tesla V100-SXM2-16GB major: 7 minor: 0 memoryClockRate(GHz): 1.53 pciBusID: 0000:1a:00.0 totalMemory: 15.78GiB freeMemory: 15.37GiB 2018-08-17 16:12:46.844860: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 1 with properties: name: Tesla V100-SXM2-16GB major: 7 minor: 0 memoryClockRate(GHz): 1.53 pciBusID: 0000:1c:00.0 totalMemory: 15.78GiB freeMemory: 15.37GiB 2018-08-17 16:12:47.168526: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 2 with properties: name: Tesla V100-SXM2-16GB major: 7 minor: 0 memoryClockRate(GHz): 1.53 pciBusID: 0000:1d:00.0 totalMemory: 15.78GiB freeMemory: 15.37GiB 2018-08-17 16:12:47.504795: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 3 with properties: name: Tesla V100-SXM2-16GB major: 7 minor: 0 memoryClockRate(GHz): 1.53 pciBusID: 0000:1e:00.0 totalMemory: 15.78GiB freeMemory: 15.37GiB 2018-08-17 16:12:47.504956: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0, 1, 2, 3 2018-08-17 16:12:49.452941: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix: 2018-08-17 16:12:49.452989: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929] 0 1 2 3 2018-08-17 16:12:49.453014: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0: N Y Y Y 2018-08-17 16:12:49.453020: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 1: Y N Y Y 2018-08-17 16:12:49.453025: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 2: Y Y N Y 2018-08-17 16:12:49.453031: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 3: Y Y Y N 2018-08-17 16:12:49.454023: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 8080 MB me mory) -> physical GPU (device: 0, name: Tesla V100-SXM2-16GB, pci bus id: 0000:1a:00.0, compute capability: 7.0) 2018-08-17 16:12:49.586349: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 8080 MB me mory) -> physical GPU (device: 1, name: Tesla V100-SXM2-16GB, pci bus id: 0000:1c:00.0, compute capability: 7.0) 2018-08-17 16:12:49.721802: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:2 with 8080 MB me mory) -> physical GPU (device: 2, name: Tesla V100-SXM2-16GB, pci bus id: 0000:1d:00.0, compute capability: 7.0) 2018-08-17 16:12:49.864456: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:3 with 8080 MB me mory) -> physical GPU (device: 3, name: Tesla V100-SXM2-16GB, pci bus id: 0000:1e:00.0, compute capability: 7.0) Running native graph INFO:tensorflow:Starting execution 2018-08-17 16:12:50.972171: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0, 1, 2, 3 2018-08-17 16:12:50.972290: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix: 2018-08-17 16:12:50.972308: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929] 0 1 2 3 2018-08-17 16:12:50.972321: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0: N Y Y Y 2018-08-17 16:12:50.972332: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 1: Y N Y Y 2018-08-17 16:12:50.972343: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 2: Y Y N Y 2018-08-17 16:12:50.972354: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 3: Y Y Y N 2018-08-17 16:12:50.973292: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 8080 MB me mory) -> physical GPU (device: 0, name: Tesla V100-SXM2-16GB, pci bus id: 0000:1a:00.0, compute capability: 7.0) 2018-08-17 16:12:50.973538: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 8080 MB me mory) -> physical GPU (device: 1, name: Tesla V100-SXM2-16GB, pci bus id: 0000:1c:00.0, compute capability: 7.0) 2018-08-17 16:12:50.973655: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:2 with 8080 MB me mory) -> physical GPU (device: 2, name: Tesla V100-SXM2-16GB, pci bus id: 0000:1d:00.0, compute capability: 7.0) 2018-08-17 16:12:50.973764: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:3 with 8080 MB me mory) -> physical GPU (device: 3, name: Tesla V100-SXM2-16GB, pci bus id: 0000:1e:00.0, compute capability: 7.0) INFO:tensorflow:Starting Warmup cycle INFO:tensorflow:Starting timing. INFO:tensorflow:Timing loop done! Running FP32 graph 2018-08-17 16:13:00.195862: I tensorflow/core/grappler/devices.cc:51] Number of eligible GPUs (core count >= 8): 4 2018-08-17 16:13:00.322100: I tensorflow/contrib/tensorrt/convert/convert_graph.cc:419] MULTIPLE tensorrt candidate conversion: 2 2018-08-17 16:13:00.328255: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:3161] Max batch size= 8 max workspace size= 19046418 2018-08-17 16:13:00.328273: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:3167] starting build engine 2018-08-17 16:13:00.403277: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:3172] Built network 2018-08-17 16:13:00.404848: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:3177] Serialized engine 2018-08-17 16:13:00.405600: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:3185] finished engine dense/my_trt_op0 containing 4 nodes 2018-08-17 16:13:00.405641: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:3192] Finished op preparation 2018-08-17 16:13:00.405692: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:3200] OK finished op building 2018-08-17 16:13:00.412386: W tensorflow/contrib/tensorrt/convert/convert_graph.cc:454] subgraph conversion error for subgraph_index:1 due to: "Unimplemented: Not supported constant type, at batch_normalization_41/Const_1" SKIPPING......( 447 nodes) INFO:tensorflow:Starting execution 2018-08-17 16:13:01.412522: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0, 1, 2, 3 2018-08-17 16:13:01.412630: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix: 2018-08-17 16:13:01.412646: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929] 0 1 2 3 2018-08-17 16:13:01.412670: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0: N Y Y Y 2018-08-17 16:13:01.412679: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 1: Y N Y Y 2018-08-17 16:13:01.412689: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 2: Y Y N Y 2018-08-17 16:13:01.412700: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 3: Y Y Y N 2018-08-17 16:13:01.413558: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 8080 MB memory) -> physical GPU (device: 0, name: Tesla V100-SXM2-16GB, pci bus id: 0000:1a:00.0, compute capability: 7.0) 2018-08-17 16:13:01.413728: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 8080 MB memory) -> physical GPU (device: 1, name: Tesla V100-SXM2-16GB, pci bus id: 0000:1c:00.0, compute capability: 7.0) 2018-08-17 16:13:01.413868: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:2 with 8080 MB memory) -> physical GPU (device: 2, name: Tesla V100-SXM2-16GB, pci bus id: 0000:1d:00.0, compute capability: 7.0) 2018-08-17 16:13:01.413974: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:3 with 8080 MB memory) -> physical GPU (device: 3, name: Tesla V100-SXM2-16GB, pci bus id: 0000:1e:00.0, compute capability: 7.0) INFO:tensorflow:Starting Warmup cycle INFO:tensorflow:Starting timing. INFO:tensorflow:Timing loop done! Running FP16 graph 2018-08-17 16:13:04.973382: I tensorflow/core/grappler/devices.cc:51] Number of eligible GPUs (core count >= 8): 4 2018-08-17 16:13:05.092119: I tensorflow/contrib/tensorrt/convert/convert_graph.cc:419] MULTIPLE tensorrt candidate conversion: 2 2018-08-17 16:13:05.102842: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:3161] Max batch size= 8 max workspace size= 19046418 2018-08-17 16:13:05.102904: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:3165] Using FP16 precision mode 2018-08-17 16:13:05.102925: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:3167] starting build engine 2018-08-17 16:13:05.344161: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:3172] Built network 2018-08-17 16:13:05.345806: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:3177] Serialized engine 2018-08-17 16:13:05.345936: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:3185] finished engine dense/my_trt_op2 containing 4 nodes 2018-08-17 16:13:05.345968: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:3192] Finished op preparation 2018-08-17 16:13:05.346002: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:3200] OK finished op building 2018-08-17 16:13:05.352164: W tensorflow/contrib/tensorrt/convert/convert_graph.cc:454] subgraph conversion error for subgraph_index:1 due to: "Unimplemented: Not supported constant type, at batch_normalization_52/Const_1" SKIPPING......( 447 nodes) INFO:tensorflow:Starting execution 2018-08-17 16:13:06.391039: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0, 1, 2, 3 2018-08-17 16:13:06.391138: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix: 2018-08-17 16:13:06.391153: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929] 0 1 2 3 2018-08-17 16:13:06.391166: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0: N Y Y Y 2018-08-17 16:13:06.391177: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 1: Y N Y Y 2018-08-17 16:13:06.391188: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 2: Y Y N Y 2018-08-17 16:13:06.391199: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 3: Y Y Y N 2018-08-17 16:13:06.392053: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 8080 MB memory) -> physical GPU (device: 0, name: Tesla V100-SXM2-16GB, pci bus id: 0000:1a:00.0, compute capability: 7.0) 2018-08-17 16:13:06.392213: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 8080 MB memory) -> physical GPU (device: 1, name: Tesla V100-SXM2-16GB, pci bus id: 0000:1c:00.0, compute capability: 7.0) 2018-08-17 16:13:06.392389: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:2 with 8080 MB memory) -> physical GPU (device: 2, name: Tesla V100-SXM2-16GB, pci bus id: 0000:1d:00.0, compute capability: 7.0) 2018-08-17 16:13:06.392503: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:3 with 8080 MB memory) -> physical GPU (device: 3, name: Tesla V100-SXM2-16GB, pci bus id: 0000:1e:00.0, compute capability: 7.0) INFO:tensorflow:Starting Warmup cycle INFO:tensorflow:Starting timing. INFO:tensorflow:Timing loop done! Predictions: Precision: native ['brambling, Fringilla montifringilla', 'electric ray, crampfish, numbfish, torpedo', 'tiger shark, Galeocerdo cuvieri', 'goldfish, Carassius auratus', 'stingray'] Precision: FP32 ['brambling, Fringilla montifringilla', 'electric ray, crampfish, numbfish, torpedo', 'tiger shark, Galeocerdo cuvieri', 'goldfish, Carassius auratus', 'stingray'] Precision: FP16 ['brambling, Fringilla montifringilla', 'electric ray, crampfish, numbfish, torpedo', 'tiger shark, Galeocerdo cuvieri', 'goldfish, Carassius auratus', 'stingray']
Log.txt:
network: native_resnet_chest_xray.pb, batchsize 8, steps 100 fps median: 545.5, mean: 547.0, uncertainty: 1.7, jitter: 17.3 latency median: 0.01467, mean: 0.01464, 99th_p: 0.01617, 99th_uncertainty: 0.00065
network: tftrt_fp32_resnet_chest_xray.pb, batchsize 8, steps 100 fps median: 531.7, mean: 533.0, uncertainty: 1.1, jitter: 4.7 latency median: 0.01505, mean: 0.01502, 99th_p: 0.01627, 99th_uncertainty: 0.00005
network: tftrt_fp16_resnet_chest_xray.pb, batchsize 8, steps 100 fps median: 548.9, mean: 550.8, uncertainty: 1.7, jitter: 16.2 latency median: 0.01458, mean: 0.01454, 99th_p: 0.01552, 99th_uncertainty: 0.00060
Protobuf frozen graph file visualized in TensorBoard
Here is the info in the input node: import/IteratorGetNext Operation: IteratorGetNext Attributes (2) output_shapes {"list":{"shape":[{"dim":[{"size":8},{"size":224},{"size":224},{"size":3}]},{"dim":[{"size":8},{"size":-1}]}]}} output_types {"list":{"type":["DT_FLOAT","DT_INT64"]}} Inputs (1) import/OneShotIterator scalar Outputs (1) import/Pad 8×224×224×3
When I inspectd my model in Tensorborad, it showed 186 subgraph nodes. When I ran the script tensorrt.py curiously it is throwing errors converting 447 nodes.
Also, I have run the script tensorrt.py with ResNet-v2-ImageNet Frozen Graph and I saw in the output it showed the message "finished engine dense/my_trt_op" versus the output using my own protobuf frozen graph which showed the message: "finished engine my_trt_op".So, is it posible that "dense/my_trt_op" is related with the error "Unimplemented: Not supported constant type"?