onnx / tensorflow-onnx

Convert TensorFlow, Keras, Tensorflow.js and Tflite models to ONNX
Apache License 2.0
2.31k stars 433 forks source link

Missing shape information for Bias for LSTM cells #1052

Closed witsang closed 3 years ago

witsang commented 4 years ago

Hi,

I am trying to run the model exported with tf2onnx in C++ with onnxruntime. When it tries to transform on one of the nodes in the lstm cells, it crashes in ln 98 in this file https://github.com/microsoft/onnxruntime/blob/master/onnxruntime/core/optimizer/matmul_add_fusion.cc because bias_shape is nullptr

The node it is transforming is: matmul_node: default_policy/model/lstm/lstm_cell/MatMul_7:0 add_node: default_policy/model/lstm/lstm_cell/add_6

the input to default_policy/model/lstm/lstm_cell/MatMul_7:0 is default_policy/Placeholder:0 default_policy/model/lstm/lstm_cell/strided_slice_3

the input to default_policy/model/lstm/lstm_cell/add_6 is: default_policy/model/lstm/lstm_cell/BiasAdd_3:0 <-- this one has no shape information. default_policy/model/lstm/lstm_cell/MatMul_7:0

I can't tell if it is something wrong with onnxruntime or the graph I exported. When I step the tf2onnx.tfonnx.process_tf_graph when it calls tensorflow_to_onnx that tensor does have an output shape, but I am not sure if in the optimize and rewrite process of the conversion, it lost the output shape?

I have attached my saved model (saved_model.pb), and the output from t2fonnx (both .onnx and .txt file) input_names: ["default_policy/observation:0"], output_names: ["default_policy/cond/Merge:0"]

model_prefreeze_input.txt is the before freezing input tensors model_prefreeze_output.txt is the before freezing output tensors

Also this my code when I do the conversion, load saved model from checkpoint, freeze the graph and convert

    _with tf1.Session() as sess:
        tf1.saved_model.loader.load(sess, [tf1.saved_model.tag_constants.SERVING], input_model_dir)
        frozen_graph = tf2onnx.tf_loader.freeze_session(sess, input_names=input_names, output_names=output_names)

# convert the frozen graph to onnx
with tf.Graph().as_default() as tf_graph:
    tf.import_graph_def(frozen_graph, name='')
with tf2onnx.tf_loader.tf_session(graph=tf_graph):
    g = tf2onnx.tfonnx.process_tf_graph(tf_graph,
                            continue_on_error=False,
                            target=None,
                            opset=None,
                            input_names=input_names,
                            output_names=output_names)
    onnx_graph = tf2onnx.optimizer.optimize_graph(g)
    model_proto = onnx_graph.make_model("converted from {}".format(input_model_dir))
    tf2onnx.utils.save_protobuf(os.path.join(output_dir, model_name + ".onnx"), model_proto)
    tf2onnx.utils.save_protobuf(os.path.join(output_dir, model_name + "_onnx.txt"), model_proto, as_text=True)_

model.zip

Thank you! Let me know if I can provide anything else. I am pretty new to tensorflow so not sure how to ask the question properly.

guschmue commented 4 years ago

You could try running is with the optimizer in onnxruntime turned off (via session options). If that runs it is a onnxruntime issue.

witsang commented 4 years ago

Thanks @guschmue for the suggestion. It now loads the session fine with session_options.SetGraphOptimizationLevel(GraphOptimizationLevel::ORT_DISABLE_ALL);

But it crashes when it runs the session. Non-zero status code return while running MatMul node. Name:'default_policy/functional_1/logits/Tensordot/matMul'

StatusMessage: Not satisfied K_ == right_shape[right_num_dims - 1]

Is this more a question for the onnxruntime git?

Also, I am using the tf2onnx from the latest source.

guschmue commented 4 years ago

With GraphOptimizationLevel::ORT_DISABLE_ALL I'd blame it on tf2onnx for now. What version of tf2onnx is this ?

witsang commented 4 years ago

I tried tf2onnx-1.7.0 and tf2onnx-1.6.3 the same error. onnxruntime is 1.3.1 githash: ccbf49e59f6bef897a94595e2213263b37d64ff3

TomWildenhain-Microsoft commented 3 years ago

Hi @witsang , Can you try again with the latest version of tf2onnx and see if this is still an issue?

guschmue commented 3 years ago

assume fixed