nwesem / mtcnn_facenet_cpp_tensorRT

Face Recognition on NVIDIA Jetson (Nano) using TensorRT
GNU General Public License v3.0
205 stars 73 forks source link

Failed to parse UFF - custom UFF file #8

Open tongvantruong opened 4 years ago

tongvantruong commented 4 years ago

I created a new uff file named new.uff by the following steps:

  1. Train Tensorflow model: https://github.com/davidsandberg/facenet/wiki/Classifier-training-of-inception-resnet-v1. The result of this step is a directory contains 4 files(checkpoint, model....index, model...data,...)

  2. Freeze the model by https://github.com/davidsandberg/facenet/blob/master/src/freeze_graph.py python freeze_graph.py path_to_directory_step_1 new.pb new.pb is my new output model

  3. Modify the code and convert that new.pb to new.uff

  4. Copy to facenetModels folder, modify code and run

The error: UffParser: Validator error: InceptionResnetV1/Repeat_2/block8_5/Branch_0/Conv2d_1x1/BatchNorm/cond/Switch: Unsupported operation _Switch ... Failed to parse UFF

Was I training the model incorrectly? I could run well with the given facenet.pb

nwesem commented 4 years ago

The given facenet.pb tensorflow graph was pruned. That means the switch node(s) were removed as stated here. There are multiple ways to do this. I would recommend the following:

  1. Visualize the tf graph using tensorboard. You will be able to identify all nodes that use the switch layer. The switch node (part of batch normalization) is only used during training of a model if i remember correctly. That means it is safe to remove it if you want to use the model only for inference (which is TensorRT's purpose).
  2. Remove all switch nodes from the graph using Tensorflow's graph transform tool. In particular you should be able to do so using the remove_nodes function.
  3. When all switch layers have been pruned, you will be able to convert to .uff

There was another tool to do this stuff, but i can't quite remember this. I will look it up if this solution is not working for you. Let me know if you need more help.

tongvantruong commented 4 years ago

The given facenet.pb tensorflow graph was pruned. That means the switch node(s) were removed as stated here. There are multiple ways to do this. I would recommend the following:

  1. Visualize the tf graph using tensorboard. You will be able to identify all nodes that use the switch layer. The switch node (part of batch normalization) is only used during training of a model if i remember correctly. That means it is safe to remove it if you want to use the model only for inference (which is TensorRT's purpose).
  2. Remove all switch nodes from the graph using Tensorflow's graph transform tool. In particular you should be able to do so using the remove_nodes function.
  3. When all switch layers have been pruned, you will be able to convert to .uff

There was another tool to do this stuff, but i can't quite remember this. I will look it up if this solution is not working for you. Let me know if you need more help.

Hi, thanks for your response.

When I read the 2. Prune and freeze TensorFlow model or get frozen model in the link from the document, I thought that I can easy obtain the frozen model by using freeze_graph as I did at step 2.

I will follow your recommendation and get back to you later.

Thanks again for the cool works on this project.

tongvantruong commented 4 years ago

Hi @nwesem Can you support me on how to "identify all nodes that use the switch layer" from the Tensorboard? I could not find it on my tensorboard. Please check my images: https://drive.google.com/file/d/1rTDqtZes4XjGDscVbJP2XLti9VCNQuBq/view?usp=sharing

tongvantruong commented 4 years ago

Hi @nwesem I am trying to optimize the model using Tensorflow's graph transform tool but seems it doesn't work. The switch nodes are still there. Same error occurred. Unsupported operation _Switch

My script for removing Switch nodes:

bazel build tensorflow/tools/graph_transforms:transform_graph

bazel-bin/tensorflow/tools/graph_transforms/transform_graph --in_graph=../m.pb --out_graph=../new.pb --inputs='Mul:0' --outputs='softmax:0' --transforms='remove_nodes(op=Switch)'

So what I did to optimize the model are:

  1. Freeze the model using freeze_graph.py
  2. Remove Switch nodes from the output of 1.

Can you give me advice?

tongvantruong commented 4 years ago

Hi @nwesem Can you share how did you create the model https://github.com/apollo-time/facenet/raw/master/model/resnet/facenet.pb ? I am trying to create that Pruned and frozen model which can be read by TensorRT.

I did the following steps: 1. Download the pre-trained facenet model here: https://github.com/davidsandberg/facenet#pre-trained-models. I downloaded this model "20180408-102900" 2. Create the saved model by using freeze_graph code from facenet (https://github.com/davidsandberg/facenet/blob/master/src/freeze_graph.py) python freeze_graph /path_to_20180408-102900_folder/ output_model.pb 3. Then I check the graph of "output_model.pb" by downloading tensorflow here https://github.com/tensorflow/tensorflow. From the root folder, I run:

bazel build tensorflow/tools/graph_transforms:summarize_graph
bazel-bin/tensorflow/tools/graph_transforms/summarize_graph --in_graph=/path_to_output_model.pb

=> The graph result is:

Found 2 possible inputs: (name=batch_size, type=int32(3), shape=<unknown>) (name=phase_train, type=bool(10), shape=<unknown>) 
No variables spotted.
Found 2 possible outputs: (name=label_batch, op=Identity) (name=embeddings, op=Mul) 
Found 23512505 (23.51M) const parameters, 0 (0) variable parameters, and 675 control_edges
Op types used: 2019 Switch, 1104 Const, 1056 Identity, 449 Sub, 449 Merge, 248 Mul, 224 FusedBatchNorm, 132 Conv2D, 131 Relu, 23 Add, 23 ConcatV2, 21 BiasAdd, 3 MaxPool, 3 Reshape, 3 Shape, 2 Placeholder, 1 Pack, 1 FIFOQueueV2, 1 RandomUniform, 1 RealDiv, 1 Maximum, 1 MatMul, 1 Rsqrt, 1 Floor, 1 Square, 1 StridedSlice, 1 AvgPool, 1 Sum, 1 QueueDequeueUpToV2
To use with tensorflow/tools/benchmark:benchmark_model try these arguments:
bazel run tensorflow/tools/benchmark:benchmark_model -- --graph=../output_model.pb --show_flops --input_layer=batch_size,phase_train --input_layer_type=int32,bool --input_layer_shape=: --output_layer=label_batch,embeddings

4. Remove Switch nodes and train nodes by running:

bazel build tensorflow/tools/graph_transforms:transform_graph
bazel-bin/tensorflow/tools/graph_transforms/transform_graph --in_graph=../output_model.pb --out_graph=../fn.pb --inputs='batch_size,phase_train' --outputs='embeddings' --transforms='remove_nodes(op=Switch, op=Identity) fold_old_batch_norms strip_unused_nodes fold_constants(ignore_errors=true)'

=> This step should remove all the Switch nodes from the model. However, when I checked the graph from the new model (fn.pb), I saw 1235 Switch nodes in the new model instead of 2019 Switch from the old model. So I could remove about 50% of the Switch nodes.

Can you please check what were wrong on these steps? I really dont know the reason. :( Or can you share detail steps which you used to generate the correct mode on this project?

Thanks a lot. Truong

nwesem commented 4 years ago

I am not sure how it works at the moment, but I know for sure that you also have to remove the phase_train input tensor in all layers. I will give this a try as soon as I find some spare time. Sry that i cannot help you right now @tongvantruong

nwesem commented 4 years ago

This guy pruned the tf graph that I was using for this implementation. Plus, his repository contains freeze_graph.py and freeze_graph_resnet.py scripts. Maybe that's a good start to figure out how to do it @tongvantruong. Here is the link to the repo. Check out the src folder.