jkjung-avt / tensorrt_demos

TensorRT MODNet, YOLOv4, YOLOv3, SSD, MTCNN, and GoogLeNet

https://jkjung-avt.github.io/

MIT License

1.75k stars 547 forks source link

Unsupported operation Cast in the ssd models #27

Closed PythonImageDeveloper closed 4 years ago

PythonImageDeveloper commented 4 years ago

Hi, your codes run very well, but when I convert the ssdmobilenetv1/2 from tensorflow model zoo, I got this error. Because this layer unsupported in the TensorRT, I want to convert the .pb model to .onnx model and then convert .onnx model to .uff and .bin, but I got some errors when converting to onnx model.

`NOTE: UFF has been tested with TensorFlow 1.12.0. Other versions are not guaranteed to work WARNING: The version of TensorFlow installed on this system is not guaranteed to work with UFF. UFF Version 0.6.3 === Automatically deduced input nodes === [name: "Input" op: "Placeholder" input: "Cast" attr { key: "dtype" value { type: DT_FLOAT } } attr { key: "shape" value { shape { dim { size: 1 } dim { size: 3 } dim { size: 300 } dim { size: 300 } } } } ]

Using output node NMS Converting to UFF graph Warning: No conversion function registered for layer: NMS_TRT yet. Converting NMS as custom op: NMS_TRT WARNING:tensorflow:From /usr/lib/python3.6/dist-packages/uff/converters/tensorflow/converter.py:179: The name tf.AttrValue is deprecated. Please use tf.compat.v1.AttrValue instead.

Warning: No conversion function registered for layer: FlattenConcat_TRT yet. Converting concat_box_conf as custom op: FlattenConcat_TRT Warning: No conversion function registered for layer: Cast yet. Converting Cast as custom op: Cast Warning: No conversion function registered for layer: GridAnchor_TRT yet. Converting GridAnchor as custom op: GridAnchor_TRT Warning: No conversion function registered for layer: FlattenConcat_TRT yet. Converting concat_box_loc as custom op: FlattenConcat_TRT No. nodes: 451 UFF Output written to tmp.uff [TensorRT] ERROR: UffParser: Validator error: Cast: Unsupported operation _Cast [TensorRT] ERROR: Network must have at least one output Traceback (most recent call last): File "main.py", line 44, in buf = engine.serialize() AttributeError: 'NoneType' object has no attribute 'serialize'`

jkjung-avt commented 4 years ago

Please provide more information:

What platform are you running the code on? Jetson or x86?
Which version of TensorRT and tensorflow are you using?
Which SSD model are you trying to optimize? Is it from the model zoo or custom trained?

PythonImageDeveloper commented 4 years ago

jetson nano
I build the Tensorflow 1.12 with your script. and the tensorflow supported tensorrt with version 6. and I also tested with tensorflow 1.13 and 1.14.
ssdlite_mobilenev2 and ssd_mobilev2 and ssd_mobilev3_small, from model zoo I saw in the some comment that said the apparently tensorflow change the name operation to Cast in the new versions, because of this changing the problem cause.

jkjung-avt commented 4 years ago

As far as I remember, UFF 0.6.3 should be part of TensorRT 5. So I guess you should be using JetPack-4.2.x. I indeed tested my code Nano + JetPack-4.2.x + tensorflow-1.12.2 before. It should work.

Otherwise, I have the following comments about the SSD models:

'ssd_mobilenet_v2_coco' should work right out of the box. I already included a frozen graph pb in this repository. Please first use that and make sure you could generate the TensorRT engine correctly.
I haven't tested 'ssdlite_mobilenet_v2_coco' before. I might try it later on.
I think there are problems with the 'ssd_mobilenet_v3_small_coco' in tensorflow model zoo. Maybe it only works with a very recent version of tensorflow and object detection API. I don't think I'll spend more time testing it for now. Reference: this post on NVIDIA Developer Forum

PythonImageDeveloper commented 4 years ago

Thanks, 1 - If I want to work with ssd_mobilenet_v2_coco, you prefer which of tensorflow version to train and test and which version of Jetpack for jetson nano without get problem and achieve good inference time? 2- In the trt_ssd.py and trt_ssd_async.py, you calculate the FPS measurement with equation below: fps = curr_fps if fps == 0.0 else (fps*0.95 + curr_fps*0.05), this cause I gradually reach to 70FPS with jetson nano using ssd_mobile_v2_hand, and achieve 32FPS using only curr_fps measurement, In your opinion which of measurement is correct to report for my project? 3- When I convert the ssdlite_mobilenet_v2 (having my custom one class) using only TensorRT, The converted graph having any TensorRT engine node., why?

jkjung-avt commented 4 years ago

For training, I recommend tensorflow-1.12.x with "6518c1c" version of object detection API. Check out my hand-detection-tutorial for how the egohands models were trained. For inferencing on Nano, use: (a) JetPack-4.2.2 with tensorflow-1.12.x, or (b) JetPack-4.3 with tensorflow-1.14.x or 1.15.0.
fps should be "exponentially decaying average" value of curr_fps. These 2 numbers should be very close.
Not clear about the question... What do you mean by "The converted graph having any TensorRT engine node"? Are you able to convert your ssdlite_mobilenet_v2 model to UFF? Then to TensorRT engine?

PythonImageDeveloper commented 4 years ago

1 - How to get "6518c1c" version of object detection API, and is this version adaptive with new version of ssd models like ssd_mobile_v3_small? 3 - My mean is that using below API for convert the pure tensorflow graph to TF-TRT graph, give no tensorrt node engine.

import tensorflow.contrib.tensorrt as trt

from tensorflow.python.framework import graph_io

trt_graph = trt.create_inference_graph( input_graph_def=frozen_graph, outputs=output_names, max_batch_size=1, max_workspace_size_bytes=1 << 25, precision_mode='FP16', minimum_segment_size=50 )

graph_io.write_graph(trt_graph, "./model/", "trt_graph.pb", as_text=False)

jkjung-avt commented 4 years ago

The "6518c1c" (hash key) version of object detection API is an older snapshot of the tensorflow/models repository. You could clone that repository and do a git checkout 6518c1c to get that version. Alternatively, you could just use my jkjung-avt/tf_trt_models code. It will check out that particular version of object detection API for you.
For TF-TRT, I recommend JetPack-4.2.2 with tensorflow-1.12.x for now. (New API has been introduced in TensorRT 6, I think.) You could reference my code and blog post in this case.

PythonImageDeveloper commented 4 years ago

Thanks, why you recommend the "6518c1c" version of object detection API? This version is older and not support the new models like ssd_mobile_v3_small/large. In the config folder of this version don't exist the ssd_mobile_v3_small/large config files, in your opinion, if I copy these files and copy in this version, can be trainable?

jkjung-avt commented 4 years ago

I used the same object detection API as in NVIDIA's original tf_trt_models repository. Based on my own testing, it worked pretty OK with tensorflow-1.12 and 1.11.

Note that different snapshops of the object detection API code do have certain dependencies on specific versions of tensorflow. It might not be a good idea to just use the latest code.

PythonImageDeveloper commented 4 years ago

Hi, I install jetpack-4.2.2 with tensorflow1.12.2 from source with your scripts, but when I run the ssd_mobilev2_coco, I get this error: but when I install the TensorFlow 1.13.1, the model correctly run. In my opinion, you are not tested TensorFlow 1.12.2 with jetpack 4.2.2, the probability you are tested with jetpack 4.2. If possible you upload a .img from jetson nano and TensorFlow 1.12.whl

UnknownError (see above for traceback): Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above. [[node FeatureExtractor/MobilenetV2/Conv/Conv2D (defined at predict_detection_ssd.py:97) = Conv2D[T=DT_FLOAT, data_format="NCHW", dilations=[1, 1, 1, 1], padding="SAME", strides=[1, 1, 2, 2], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](FeatureExtractor/MobilenetV2/Conv/Conv2D-0-TransposeNHWCToNCHW-LayoutOptimizer, FeatureExtractor/MobilenetV2/Conv/weights)]] [[{{node Postprocessor/Slice/_47}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_1220_Postprocessor/Slice", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

jkjung-avt commented 4 years ago

I think I have tested self-built tensorflow-1.12.2 with both JetPack-4.2.1 and 4.2.2 on Jetson Nano. And currently, I'm running the code using tensorflow-1.15.0 with JetPack-4.3. I never encountered such an issue... Strange...

PythonImageDeveloper commented 4 years ago

your mean you install official TensorFlow .whl file?

jkjung-avt commented 4 years ago

No. In all 3 cases mentioned above, I used tensorflow built from source by myself.

PythonImageDeveloper commented 4 years ago

Can you upload your tensorflow-gpu-1.12.2.whl build file? I really necessary that file.

jkjung-avt commented 4 years ago

I think I only keep the tensorflow-1.12.2 whl for JetPack-4.2.2. Please provide a place (e.g. GoogleDrive) for me to upload the file. You could send the link to my email: jkjung13@gmail.com