Error in OnnxTensorflowImport.ipynb

ddysher commented 6 years ago

Describe the bug

While following OnnxTensorflowImport tutorial, I've encountered the following error:

Traceback (most recent call last):
  File "import_tensorflow.py", line 5, in <module>
    tf_rep = prepare(model)       # tensorflow representation of the model
  File "/usr/local/lib/python2.7/site-packages/onnx_tf-1.1.2-py2.7.egg/onnx_tf/backend.py", line 76, in prepare
    return cls.onnx_model_to_tensorflow_rep(model, strict)
  File "/usr/local/lib/python2.7/site-packages/onnx_tf-1.1.2-py2.7.egg/onnx_tf/backend.py", line 87, in onnx_model_to_tensorflow_rep
    return cls._onnx_graph_to_tensorflow_rep(model.graph, model.opset_import, strict)
  File "/usr/local/lib/python2.7/site-packages/onnx_tf-1.1.2-py2.7.egg/onnx_tf/backend.py", line 141, in _onnx_graph_to_tensorflow_rep
    onnx_node, tensor_dict, handlers, opset=opset, strict=strict)
  File "/usr/local/lib/python2.7/site-packages/onnx_tf-1.1.2-py2.7.egg/onnx_tf/backend.py", line 236, in _onnx_node_to_tensorflow_op
    return handler.handle(node, tensor_dict=tensor_dict, strict=strict)
  File "/usr/local/lib/python2.7/site-packages/onnx_tf-1.1.2-py2.7.egg/onnx_tf/handlers/handler.py", line 60, in handle
    return ver_handle(node, **kwargs)
  File "/usr/local/lib/python2.7/site-packages/onnx_tf-1.1.2-py2.7.egg/onnx_tf/handlers/backend/add.py", line 23, in version_7
    return [cls.make_tensor_from_onnx_node(node, **kwargs)]
  File "/usr/local/lib/python2.7/site-packages/onnx_tf-1.1.2-py2.7.egg/onnx_tf/handlers/backend_handler.py", line 111, in make_tensor_from_onnx_node
    return cls._run_tf_func(tf_func, inputs, attrs)
  File "/usr/local/lib/python2.7/site-packages/onnx_tf-1.1.2-py2.7.egg/onnx_tf/handlers/backend_handler.py", line 180, in _run_tf_func
    **dict([(p, attrs[p]) for p in params if p in attrs]))
  File "/Users/deyuandeng/Library/Python/2.7/lib/python/site-packages/tensorflow/python/ops/gen_math_ops.py", line 297, in add
    "Add", x=x, y=y, name=name)
  File "/Users/deyuandeng/Library/Python/2.7/lib/python/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/Users/deyuandeng/Library/Python/2.7/lib/python/site-packages/tensorflow/python/util/deprecation.py", line 454, in new_func
    return func(*args, **kwargs)
  File "/Users/deyuandeng/Library/Python/2.7/lib/python/site-packages/tensorflow/python/framework/ops.py", line 3155, in create_op
    op_def=op_def)
  File "/Users/deyuandeng/Library/Python/2.7/lib/python/site-packages/tensorflow/python/framework/ops.py", line 1731, in __init__
    control_input_ops)
  File "/Users/deyuandeng/Library/Python/2.7/lib/python/site-packages/tensorflow/python/framework/ops.py", line 1579, in _create_c_op
    raise ValueError(str(e))
ValueError: Dimensions must be equal, but are 224 and 64 for 'Add' (op: 'Add') with input shapes: [1,64,224,224], [64].

To Reproduce

Follow the tutorial to import super_resolution.onnx.

ONNX model file

https://github.com/onnx/tutorials/tree/master/tutorials/assets

Python, ONNX, ONNX-TF, Tensorflow version

This section can be obtained by running get_version.py from util folder.

Python version: 2.7.15
ONNX version: 1.3.0
ONNX-TF version: 1.1.2
Tensorflow version: 1.10.1

Additional context

There is an existing bug report here.

It seems there's some problems with the model, but since it works before (as shown in the tutorial), I'm wondering if this is a regression in onnx-tensorflow?

tjingrant commented 6 years ago

Hi thanks for the bug report, we noticed it and it's being fixed here: https://github.com/onnx/onnx-tensorflow/pull/243.

ddysher commented 6 years ago

@tjingrant Thanks for the quick response!

Is this introduced due to version skew between onnx and onnx-tf?

tjingrant commented 6 years ago

It's because of the newly introduced opset (7, if I remember correctly), which removed explicit broadcasting and embraced numpy style implicit broadcasting. We relied on explicit broadcasting to do bias addition; now since there's no explicit broadcasting, we have to somehow manually imitate it, which is what this PR is about.

ddysher commented 6 years ago

@tjingrant thanks! i'll help verify once it's merged

tjingrant commented 6 years ago

@ddysher merged. Let me know if anything wrong happens.

ddysher commented 6 years ago

@tjingrant I'm still seeing this error.

$ python import_tensorflow.py                                                         
2018-09-11 22:49:37.197071: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
Traceback (most recent call last):
  File "import_tensorflow.py", line 5, in <module>
    tf_rep = prepare(model)       # tensorflow representation of the model
  File "/Users/deyuandeng/code/workspace/src/github.com/onnx/onnx-tensorflow/onnx_tf/backend.py", line 76, in prepare
    return cls.onnx_model_to_tensorflow_rep(model, strict)
  File "/Users/deyuandeng/code/workspace/src/github.com/onnx/onnx-tensorflow/onnx_tf/backend.py", line 87, in onnx_model_to_tensorflow_rep
    return cls._onnx_graph_to_tensorflow_rep(model.graph, model.opset_import, strict)
  File "/Users/deyuandeng/code/workspace/src/github.com/onnx/onnx-tensorflow/onnx_tf/backend.py", line 141, in _onnx_graph_to_tensorflow_rep
    onnx_node, tensor_dict, handlers, opset=opset, strict=strict)
  File "/Users/deyuandeng/code/workspace/src/github.com/onnx/onnx-tensorflow/onnx_tf/backend.py", line 236, in _onnx_node_to_tensorflow_op
    return handler.handle(node, tensor_dict=tensor_dict, strict=strict)
  File "/Users/deyuandeng/code/workspace/src/github.com/onnx/onnx-tensorflow/onnx_tf/handlers/handler.py", line 60, in handle
    return ver_handle(node, **kwargs)
  File "/Users/deyuandeng/code/workspace/src/github.com/onnx/onnx-tensorflow/onnx_tf/handlers/backend/add.py", line 23, in version_7
    return [cls.make_tensor_from_onnx_node(node, **kwargs)]
  File "/Users/deyuandeng/code/workspace/src/github.com/onnx/onnx-tensorflow/onnx_tf/handlers/backend_handler.py", line 111, in make_tensor_from_onnx_node
    return cls._run_tf_func(tf_func, inputs, attrs)
  File "/Users/deyuandeng/code/workspace/src/github.com/onnx/onnx-tensorflow/onnx_tf/handlers/backend_handler.py", line 180, in _run_tf_func
    **dict([(p, attrs[p]) for p in params if p in attrs]))
  File "/Users/deyuandeng/Library/Python/2.7/lib/python/site-packages/tensorflow/python/ops/gen_math_ops.py", line 297, in add
    "Add", x=x, y=y, name=name)
  File "/Users/deyuandeng/Library/Python/2.7/lib/python/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/Users/deyuandeng/Library/Python/2.7/lib/python/site-packages/tensorflow/python/util/deprecation.py", line 454, in new_func
    return func(*args, **kwargs)
  File "/Users/deyuandeng/Library/Python/2.7/lib/python/site-packages/tensorflow/python/framework/ops.py", line 3155, in create_op
    op_def=op_def)
  File "/Users/deyuandeng/Library/Python/2.7/lib/python/site-packages/tensorflow/python/framework/ops.py", line 1731, in __init__
    control_input_ops)
  File "/Users/deyuandeng/Library/Python/2.7/lib/python/site-packages/tensorflow/python/framework/ops.py", line 1579, in _create_c_op
    raise ValueError(str(e))
ValueError: Dimensions must be equal, but are 224 and 64 for 'Add' (op: 'Add') with input shapes: [1,64,224,224], [64].

I installed onnx_tf via pip install -e .. The master branch on my local computer is the latest:

$ git log
commit 7162525391f0f6618d3e329856f6a16095235d32 (HEAD -> master, origin/master, origin/HEAD)
Author: Tian Jin <tjingrant@gmail.com>
Date:   Sun Sep 9 22:21:06 2018 -0400

    bias_add boradcasting fix (#243)

    * bias_add boradcasting fix

    * remove print

    * remove print

    * resolve comments

    * add underscore

commit 16bc0b2595cb548581c6d020dabff9d2aed89109
Author: Wenhao Hu <fumihwh@gmail.com>
Date:   Mon Sep 3 08:46:37 2018 +0900

    add mvn to backend (#241)

tjingrant commented 6 years ago

@ddysher it's the same cause, slightly different symptoms.

Basically, the model in question is no longer compliant and the ONNX checker somehow didn't complain. According to https://github.com/onnx/onnx/blob/master/onnx/onnx-ml.proto, opset_import is mandatory (spec doc copied below); but the tutorial model does not specify opset_import, which means we automatically import the latest opset, causing the trouble I explained to you earlier. I tried forcing the opset_import to be 1 and the model is imported correctly as expected.

  // The OperatorSets this model relies on.
  // All ModelProtos MUST have at least one entry that
  // specifies which version of the ONNX OperatorSet is
  // being imported.
  //
  // All nodes in the ModelProto's graph will bind against the operator
  // with the same-domain/same-op_type operator with the HIGHEST version
  // in the referenced operator sets.
  repeated OperatorSetIdProto opset_import = 8;

That being said, the error you see is not a problem of onnx-tf, but the problem of the super resolution model used in the tutorial. We did not author this tutorial and therefore do not have the model in question; let me post a issue on onnx page to ask for a model update.

ddysher commented 6 years ago

@tjingrant thanks for the analysis. almost all getting started guides about converting onnx to tf point to the tutorial, it would be nice to have it fixed :)

tjingrant commented 6 years ago

@ddysher we've notified the tutorial owner; in the mean time, I discussed with folks from the spec repository and decided a simple patch to the ONNX spec will fix this problem immediately; I've implemented the corresponding change of behavior in onnx-tf and I've tested that the super-resolution net seems to be working. We should be able to merge this one quickly.

onnx / onnx-tensorflow

Error in OnnxTensorflowImport.ipynb #244