onnx / tutorials

Tutorials for creating and using ONNX models
Apache License 2.0
3.4k stars 631 forks source link

onnx_tf.backend.prepare(model) in Tensorflow to ONNX tutorial error: "InvalidArgumentError: Dimensions must be equal" #41

Open ividal opened 6 years ago

ividal commented 6 years ago

Python version: 3.5.2 onnx==1.2.1 onnx-tf==1.1.2 tensorflow-gpu==1.8.0 Using tutorial as of this commit.

Following the instructions in the tutorial, I've used this script to train. Worked smoothly. I froze the model using:

python3 /path/to/site-packages/tensorflow/python/tools/freeze_graph.py \
    --input_graph=/home/ividal/dev/onnx/tutorials/tutorials/graph.proto \
    --input_checkpoint=/home/ividal/dev/onnx/tutorials/tutorials/ckpt/model.ckpt \
    --output_graph=/tmp/frozen_graph.pb \
    --output_node_names=fc2/add \
    --input_binary=True

This produced the expected /tmp/frozen_graph.pb . The export code in the tutorial provides the expected mnist.onnx file.

model = onnx.load('mnist.onnx') works, but:

tf_rep = prepare(model) yields:

---------------------------------------------------------------------------
InvalidArgumentError                      Traceback (most recent call last)
~/.venvs/onnx/lib/python3.5/site-packages/tensorflow/python/framework/ops.py in _create_c_op(graph, node_def, inputs, control_inputs)
   1566   try:
-> 1567     c_op = c_api.TF_FinishOperation(op_desc)
   1568   except errors.InvalidArgumentError as e:

InvalidArgumentError: Dimensions must be equal, but are 16 and 64 for 'Add_1' (op: 'Add') with input shapes: [?,64,?,16], [1,1,1,64].

From the error message, I gather the expected channels might be switched (?). However, I did not modify the tutorial code, so it shouldn't be that. Any ideas...?

Thanks!

ividal commented 6 years ago

Just in case I repeated everything with Tensorflow 1.5.0, since it's the last version explicitly mentioned in the documentation, but the error is exactly the same.

[Edit] For the sake of completeness, I tried freezing the graph with bazel-built tool, as the original tutorial suggested. Same results.

bazel build tensorflow/python/tools:freeze_graph
bazel-bin/tensorflow/python/tools/freeze_graph \
    --input_graph=/home/ividal/dev/onnx/tutorials/tutorials/graph.proto \
    --input_checkpoint=/home/ividal/dev/onnx/tutorials/tutorials/ckpt/model.ckpt \
    --output_graph=/tmp/frozen_graph.pb \
    --output_node_names=fc2/add \
    --input_binary=True
Revo-Future commented 6 years ago

i meet the same problem with u, does anyone has some solution??

ividal commented 6 years ago

I have a feeling it's this problem. NCHW vs NHWC at the different steps: training vs freezing vs exporting vs loading in onnx. Just don't know exactly where or how to fix it.

knandanan commented 6 years ago

@ividal Did you find a solution to this issue yet? Please let me know.

mukul74 commented 6 years ago

Did anyone found the solution to the issue ?

AmitAilianiSDC commented 5 years ago

Did anyone found the solution to the issue ?

ividal commented 5 years ago

@knandanan No, sorry, opted to keep onnx and TF separate for this (I stick to TF and a .pb if deployment had to be with TF, e.g. an android device).