microsoft / vs-tools-for-ai

Visual Studio Tools for AI is a free Visual Studio extension to build, test, and deploy deep learning / AI solutions. It seamlessly integrates with Azure Machine Learning for robust experimentation capabilities, including but not limited to submitting data preparation and model training jobs transparently to different compute targets. Additionally, it provides support for custom metrics and run history tracking, enabling data science reproducibility and auditing. Enterprise ready collaboration, allow to securely work on project with other people.
http://aka.ms/vstoolsforai
353 stars 81 forks source link

Can't convert/import tensorflow .pb model #16

Open rgsousa88 opened 6 years ago

rgsousa88 commented 6 years ago

Hi everyone,

I'm facing some errors when I try to convert or import a tensorflow .pb model using AI Tools converter or Import Model. The message that I've gotta is this:

KeyError: "The name 'input:0' refers to a Tensor which does not exist. The operation, 'input', does not exist in the graph."

I'm passing the correct internal names for input and output tensors (I'm able to load and use the same model inside a Python script). I've visualized my model using Netron and 'input' exists in the graph. The model is available here.

Thanks in advance.

shishaochen commented 6 years ago

Could you provide more details? I downloaded the pretrained model 20180402-114759 but I cannot find the tensor with name "input:0" from the file named "20180402-114759.pb". Anyway. I suggest you export it to a SavedModel file first. Then when importing the model using Model Inference Library, VS Tools For AI will extract input/output nodes automatically.

rgsousa88 commented 6 years ago

Hi @shishaochen,

I'll try to export to a SavedModel (I don't know how to do it, yet, but I'll search) and then I'll post the results here. Thanks for your time!

shishaochen commented 6 years ago

@rgsousa88 There is an example at https://github.com/Microsoft/samples-for-ai/blob/master/projects/StyleTransfer/StyleTransferTraining/src/train.py#L218 or you can simply call tf.saved_model.simple_save.

rgsousa88 commented 6 years ago

Hi @shishaochen,

First off, thanks for your suggestion. I've exported the .pb file to a SavedModel as you suggested using the following script.

with tf.Session() as sess:
    with tf.gfile.FastGFile(filename,"rb") as file:
        graph_def = tf.GraphDef()
        graph_def.ParseFromString(file.read())
        tf.import_graph_def(graph_def,input_map=None,name='')
    input = tf.get_default_graph().get_tensor_by_name("input:0")
    emb = tf.get_default_graph().get_tensor_by_name("embeddings:0")
    phase = tf.get_default_graph().get_tensor_by_name("phase_train:0")
    tf.saved_model.simple_save(sess,"..\\teste_model_2\\",inputs={"input":input,"phase":phase},outputs={"output":emb})

When a tried to import the new saved model using Import tool, I've gotta the following exception:

mlscoring.exporter.exception.ExportException: 204:'Tensor input:0 with fully unknown shape not supported for serving in APIs'

I'm not sure why this is happening to this model... What do you think? Do you think that generate the saved model using metagraph and checkpoint files would be a right way to do this?

Thanks for your time.

shishaochen commented 6 years ago

@rgsousa88 Currently, models with input nodes of unknown shape are not supported yet. But you can work around the limitation by reconstructing the inference graph with input nodes of explicit shape. Please use the following codes to export the pretrained model into the SavedModel format:

import facenet # Source file in the FaceNet Git repostory
import os
import shutil
import tensorflow as tf

if __name__ == '__main__':
    image_batch = tf.placeholder(tf.float32, shape=(None,160,160,3)) # Explicitly set the image shape
    is_training = tf.constant(False)
    facenet.load_model('20180402-114759.pb', input_map={'input':image_batch, 'phase_train':is_training})
    embeddings = tf.get_default_graph().get_tensor_by_name("embeddings:0")

    export_dir = 'export'
    if os.path.isdir(export_dir):
        shutil.rmtree(export_dir)
    with tf.Session() as sess:
        tf.summary.FileWriter(export_dir, sess.graph) # Visualize the graph using TensorBoard
        tf.saved_model.simple_save(sess, export_dir, inputs={'image_batch':image_batch}, outputs={'embeddings':embeddings})

Then you can create the Model Inference Library project by importing the exported SavedModel file.

rgsousa88 commented 6 years ago

@shishaochen Thanks a lot for your help and advising. I'm able to load the model and create de Model Inference Library. I'll search how to include this library in a UWP project and test the model behavior inside my test application. Thanks you one more time!

shishaochen commented 6 years ago

@rgsousa88 I have to say the Microsoft.ML.Scoring library the project consumes has native DLLs, so it is hard to build it as an UWP library before new release.

However, there are 2 ways to work around:

UWP support of the Model Inference Library is in our plan and you may be able to try it in future.

rgsousa88 commented 6 years ago

@shishaochen Thanks for your hints. I've tried to convert the SavedModel (created using your script above) to a Onnx model using Convert Model in AI Tools menu but it failed. I thought that if I was able to import a model and create a Model Inference Library, I would be able to convert it too, but I was wrong... I'll inspect the other two options but the best case would be convert saved model to Onnx model... Anyway, I'm very grateful for your support and sugestions. If you're curious about the error I've gotta when trying to convert, I'm posting it below.

System.Exception: "path_to_saved_model"\saved_model.pb is not a valid TensorFlow model. Traceback (most recent call last): File "C:\Users\myuser\AppData\Local\Microsoft\VisualStudio\15.0_692e112c\Extensions\yyr25epz.rmr\RuntimeSDK\model\model_converter_cli.py", line 379, in check_node_valid graph.ParseFromString(file_content) google.protobuf.message.DecodeError: Error parsing message

One more time, thanks for your time and help.

rgsousa88 commented 6 years ago

@shishaochen , I've tried to use .meta and .check point files to convert the model. It's failed too... In this case, I've gotta a different error that I'm posting below. Is there any limitations related to tensorflow models supported by AI Tools Convert option? I mean, is convert tensorflow models limited to some operations or architectures? In this case, would be better to inform in documentation these limitations or constraints? I'll appreciate your considerations... Thanks in advice.

Error that I've mentioned:

(Traceback infos...)

ValueError: graph_def is invalid at node 'InceptionResnetV1/Conv2d_1a_3x3/BatchNorm/cond_1/AssignMovingAvg/Switch': Input tensor 'InceptionResnetV1/Conv2d_1a_3x3/BatchNorm/moving_mean:0' Cannot convert a tensor of type float32 to an input of type float32_ref. using tensorflow=1.5.0, onnx=1.1.2

shishaochen commented 6 years ago

@rgsousa88 The model converter backend is a Python package named tf2onnx. It is still in preview and may not cover all models. I have asked some expert on this feature to investigate the FaceNet conversion to ONNX. Please try other 2 ways first to work around.

rgsousa88 commented 6 years ago

Hi @shishaochen , I tried to convert using tf2onnx scripts a few weeks ago but I hadn't any success... I'll check wait I can do with the others two options... Anyway, I'm grateful for your attention e aid. If I would have any success with conversion, I'll let you know.

JiahaoYao commented 6 years ago

Hi @rgsousa88 , I hope this might help you.

ValueError: graph_def is invalid at node 
'InceptionResnetV1/Conv2d_1a_3x3/BatchNorm/cond_1/AssignMovingAvg/Switch': Input tensor 

The error you mentioned before mgiht be due to tf.cond in the pretrained model. It is the control flow operator and the convertor does not support it.

One practical Approach

First, download the inception resnet v1 code as well as pretrained model from project page.

Second, write the script below to reload the model and transform the graph.

📄 bottleneck_layer_size=512 is consistent to the pretrained model

📄 phase_train=False can simply the batchnorm layer in conversion

⚠️ We only convert the main body of the FaceNet, a.k.a inception-resnet-v2. One might add something else to reach the effect of FaceNet.

import tensorflow as tf
import inception_resnet_v1

data_input = tf.placeholder(name='input', dtype=tf.float32, shape=[None, 299, 299, 3])

output, _ = inception_resnet_v1.inference(data_input, keep_probability=0.8, phase_train=False, bottleneck_layer_size=512)

init = tf.global_variables_initializer()
with tf.Session() as sess:
    sess.run(init)
    saver = tf.train.Saver()
    saver.restore(sess, '/Users/kit/Downloads/20180408-102900/model-20180408-102900.ckpt-90')
    path = '/Users/kit/Downloads/'
    tf.train.write_graph(sess.graph,'/Users/kit/Downloads', 'imagenet_facenet.pb',as_text=False)
    save_path = saver.save(sess, path + "imagenet_facenet.ckpt" )
    print("Model saved in file: %s" % save_path)

Then, you can use the tfonnx_freeze_graph.py to freeze the graph.

python -m tfonnx_freeze_graph --input_graph=/Users/kit/Downloads/imagenet_facenet.pb  --input_binary=true --input_names=input:0 --output_node_names=InceptionResnetV1/Bottleneck/BatchNorm/batchnorm/add_1 --input_checkpoint=/Users/kit/Downloads/imagenet_facenet.ckpt --output_graph=frozen.pb

In this reloaded model, tf.cond operators are removed from the graph. And I am currently working on MMdnn to make the conversion possible.

JiahaoYao commented 6 years ago

Hi, @rgsousa88 You might try the newest version of MMdnn by

pip install -U git+https://github.com/Microsoft/MMdnn.git@master

In order to convert the above model to onnx, you might run the code below and you might change the code for your own file path.

mmconvert -sf tensorflow -in /Users/kit/Downloads/imagenet_facenet.ckpt.meta  -iw /Users/kit/Downloads/imagenet_facenet.ckpt -df onnx -om /Users/kit/Downloads/facenet.onnx --dstNodeName InceptionResnetV1/Bottleneck/BatchNorm/batchnorm/add_1

I get the following result and some of the layers is skipped.

Parse file [/Users/kit/Downloads/imagenet_facenet.ckpt.meta] with binary format successfully.
Tensorflow model file [/Users/kit/Downloads/imagenet_facenet.ckpt.meta] loaded successfully.
Tensorflow checkpoint file [/Users/kit/Downloads/imagenet_facenet.ckpt] loaded successfully. [490] variables loaded.
Tensorflow has not supported operator [Slice] with name [InceptionResnetV1/Logits/Flatten/Slice].
Tensorflow has not supported operator [Slice] with name [InceptionResnetV1/Logits/Flatten/Slice_1].
Tensorflow has not supported operator [Prod] with name [InceptionResnetV1/Logits/Flatten/Prod].
Tensorflow has not supported operator [ExpandDims] with name [InceptionResnetV1/Logits/Flatten/ExpandDims].
IR network structure is saved as [22d65258880149e8b78ffc636043fb4e.json].
IR network structure is saved as [22d65258880149e8b78ffc636043fb4e.pb].
IR weights are saved as [22d65258880149e8b78ffc636043fb4e.npy].
Parse file [22d65258880149e8b78ffc636043fb4e.pb] with binary format successfully.
Warning: Graph Construct a self-loop node InceptionResnetV1/Logits/Flatten/Slice. Ignored.
Warning: Graph Construct a self-loop node InceptionResnetV1/Logits/Flatten/ExpandDims. Ignored.
OnnxEmitter has not supported operator [Shape].
InceptionResnetV1/Logits/Flatten/Shape
Target network code snippet is saved as [22d65258880149e8b78ffc636043fb4e.py].
Target weights are saved as [22d65258880149e8b78ffc636043fb4e.npy].
ONNX model file is saved as [/Users/kit/Downloads/facenet.onnx], generated by [22d65258880149e8b78ffc636043fb4e.py] and [22d65258880149e8b78ffc636043fb4e.npy].

Hoping this might help!

rgsousa88 commented 6 years ago

Hi @JiahaoYao ,

I've followed steps that you described above and it was not possible to freeze the graph due to error bellow:

AssertionError: InceptionResnetV1/Bottleneck/BatchNorm/batchnorm/add_1 is not in graph

In the command you suggested, this parameter is passed as an output_node_name but inspecting the graph (using Neutron) there is no output node with this name. The output node or final node is id: InceptionResnetV1/Bottleneck/BatchNorm/FusedBatchNorm:0.

But, if I understand right, the scripts and commands above are attempts to only convert the body of FaceNet not the whole thing. In this case, I won't be able to use the converted model (onnx format) as the original one was designed, to extract facial features and perform face recognition, will I?

Anyway, I'm grateful for your attention and help.

JiahaoYao commented 6 years ago

Hi @rgsousa88, First, for the name of output, I get it from the following way. The inception_resnet_v1 is downloaded from the Facenet project website.

import tensorflow as tf
import inception_resnet_v1
data_input = tf.placeholder(name='input', dtype=tf.float32, shape=[None, 299, 299, 3])
output, _ = inception_resnet_v1.inference(data_input, keep_probability=0.8, phase_train=False, bottleneck_layer_size=512)
print(output.op.name)

I get the name

'InceptionResnetV1/Bottleneck/BatchNorm/batchnorm/add_1'

For the frozen graph, it works for me, like the following image. image

Finally, since in the official webisite, I only find the inception resnet v1 and I bet it is the very main part of the facenet. If the facenet model is available in facenet.py, I think it can be reloaded and remove the tf.cond in Batchnorm layer and then it works for the conversion.

Your understanding is okay, because in the example, I just convert until the last node of inception resnet. And I think it might be converted until the very last node in the graph, which might fulfill the face recognition.

sidhantls commented 6 years ago

why isn't there a simple way to load the pre trained model for this? something like what Keras has - load_model('model.h5').

shishaochen commented 6 years ago

@sid-sundrani If you wanna import TensorFlow SavedModel or ONNX model, then you only need select the model path in our Import Model Wizard. Otherwise, you need provide the model serving interface as TensorFlow Checkpoing does not has such fields for us to extract.

gzchenjiajun commented 5 years ago

google.protobuf.message.DecodeError: Error parsing message How to solve this problem? I haven't seen the right answer yet

tienduchoang commented 5 years ago

Hi all, i got same error when i export LPRnet (https://github.com/opencv/openvino_training_extensions/tree/develop/tensorflow_toolkit/lpr).

if i use export.py to convert ckpt -> pb then using .pb to inference. it return nothing. Please help me!

FrancPS commented 4 years ago

Has anyone been able to convert the model to SavedModel or .onnx?