onnx / tensorflow-onnx

Convert TensorFlow, Keras, Tensorflow.js and Tflite models to ONNX
Apache License 2.0
2.29k stars 432 forks source link

Moving to opset 11 is causing issues #886

Closed ttdd11 closed 4 years ago

ttdd11 commented 4 years ago

Trying to build a model for opset 11. For version 1.5.1, I am getting an error regarding inferring shapes and dtypes. For version 1.5.6, the optimizer doesn't seem to be working and it failing because of deep_copy. Any help would be greatly appreciated.

Trace version 1.5.1:

2020-04-15 05:32:20,772 - WARNING - From C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\site-packages\tf2onnx\verbose_logging.py:71: The name tf.logging.set_verbosity is deprecated. Please use tf.compat.v1.logging.set_verbosity instead.

2020-04-15 05:32:37,748 - INFO - Using tensorflow=1.14.0, onnx=1.6.0, tf2onnx=1.5.1/0c735a 2020-04-15 05:32:37,756 - INFO - Using opset <onnx, 11> 2020-04-15 05:32:42,706 - WARNING - ONNX Failed to infer shapes and dtypes for [Resize219, type: Resize] Traceback (most recent call last): File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\site-packages\tf2onnx\schemas.py", line 157, in infer_onnx_shape_dtype inferred_model = shape_inference.infer_shapes(model_proto) File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\site-packages\onnx\shape_inference.py", line 35, in infer_shapes inferred_model_str = C.infer_shapes(model_str) RuntimeError: input 2 is out of bounds 2020-04-15 05:32:42,789 - WARNING - ONNX Failed to infer shapes and dtypes for [Resize__242, type: Resize] Traceback (most recent call last): File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\site-packages\tf2onnx\schemas.py", line 157, in infer_onnx_shape_dtype inferred_model = shape_inference.infer_shapes(model_proto) File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\site-packages\onnx\shape_inference.py", line 35, in infer_shapes inferred_model_str = C.infer_shapes(model_str) RuntimeError: input 2 is out of bounds 2020-04-15 05:32:42,810 - WARNING - ONNX Failed to infer shapes and dtypes for [Resize247, type: Resize] Traceback (most recent call last): File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\site-packages\tf2onnx\schemas.py", line 157, in infer_onnx_shape_dtype inferred_model = shape_inference.infer_shapes(model_proto) File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\site-packages\onnx\shape_inference.py", line 35, in infer_shapes inferred_model_str = C.infer_shapes(model_str) RuntimeError: input 2 is out of bounds 2020-04-15 05:32:42,867 - WARNING - ONNX Failed to infer shapes and dtypes for [Resize264, type: Resize] Traceback (most recent call last): File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\site-packages\tf2onnx\schemas.py", line 157, in infer_onnx_shape_dtype inferred_model = shape_inference.infer_shapes(model_proto) File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\site-packages\onnx\shape_inference.py", line 35, in infer_shapes inferred_model_str = C.infer_shapes(model_str) RuntimeError: input 2 is out of bounds 2020-04-15 05:32:42,873 - WARNING - ONNX Failed to infer shapes and dtypes for [Resize__269, type: Resize] Traceback (most recent call last): File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\site-packages\tf2onnx\schemas.py", line 157, in infer_onnx_shape_dtype inferred_model = shape_inference.infer_shapes(model_proto) File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\site-packages\onnx\shape_inference.py", line 35, in infer_shapes inferred_model_str = C.infer_shapes(model_str) RuntimeError: input 2 is out of bounds 2020-04-15 05:32:42,913 - WARNING - ONNX Failed to infer shapes and dtypes for [Resize280, type: Resize] Traceback (most recent call last): File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\site-packages\tf2onnx\schemas.py", line 157, in infer_onnx_shape_dtype inferred_model = shape_inference.infer_shapes(model_proto) File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\site-packages\onnx\shape_inference.py", line 35, in infer_shapes inferred_model_str = C.infer_shapes(model_str) RuntimeError: input 2 is out of bounds 2020-04-15 05:32:42,924 - WARNING - ONNX Failed to infer shapes and dtypes for [Resize__285, type: Resize] Traceback (most recent call last): File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\site-packages\tf2onnx\schemas.py", line 157, in infer_onnx_shape_dtype inferred_model = shape_inference.infer_shapes(model_proto) File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\site-packages\onnx\shape_inference.py", line 35, in infer_shapes inferred_model_str = C.infer_shapes(model_str) RuntimeError: input 2 is out of bounds 2020-04-15 05:32:43,345 - INFO - 2020-04-15 05:32:44,187 - WARNING - Failed to optimize model proto Traceback (most recent call last): File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\site-packages\tf2onnx\graph.py", line 1167, in optimize_model_proto graph = GraphUtil.create_graph_from_onnx_model(onnx_model_proto) File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\site-packages\tf2onnx\graph.py", line 1206, in create_graph_from_onnx_model inferred_model = shape_inference.infer_shapes(onnx_model_proto) File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\site-packages\onnx\shape_inference.py", line 35, in infer_shapes inferred_model_str = C.infer_shapes(model_str) RuntimeError: input 1 is out of bounds 2020-04-15 05:32:44,218 - INFO - 2020-04-15 05:32:44,218 - INFO - Successfully converted TensorFlow model C:/Users/tmp/net.pb to ONNX 2020-04-15 05:32:45,539 - INFO - ONNX model is saved at C:/Users/tmp/net.onnx

Trace version 1.5.6:

2020-04-15 05:33:53,665 - WARNING - From C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\site-packages\tf2onnx\verbose_logging.py:72: The name tf.logging.set_verbosity is deprecated. Please use tf.compat.v1.logging.set_verbosity instead.

2020-04-15 05:34:08,464 - INFO - Using tensorflow=1.14.0, onnx=1.6.0, tf2onnx=1.5.6/80edd7 2020-04-15 05:34:08,464 - INFO - Using opset <onnx, 11> 2020-04-15 05:34:11,773 - INFO - Optimizing ONNX model 2020-04-15 05:34:11,873 - WARNING - Failed to apply optimize_transpose Traceback (most recent call last): File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\site-packages\tf2onnx\optimizer__init.py", line 50, in optimize_graph current = copy.deepcopy(graph) File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 180, in deepcopy y = _reconstruct(x, memo, rv) File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 280, in _reconstruct state = deepcopy(state, memo) File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 150, in deepcopy y = copier(x, memo) File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 240, in _deepcopy_dict y[deepcopy(key, memo)] = deepcopy(value, memo) File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 150, in deepcopy y = copier(x, memo) File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 240, in _deepcopy_dict y[deepcopy(key, memo)] = deepcopy(value, memo) File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 180, in deepcopy y = _reconstruct(x, memo, rv) File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 280, in _reconstruct state = deepcopy(state, memo) File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 150, in deepcopy y = copier(x, memo) File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 220, in _deepcopy_tuple y = [deepcopy(a, memo) for a in x] File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 220, in y = [deepcopy(a, memo) for a in x] File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 150, in deepcopy y = copier(x, memo) File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 240, in _deepcopy_dict y[deepcopy(key, memo)] = deepcopy(value, memo) File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 180, in deepcopy y = _reconstruct(x, memo, rv) File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 280, in _reconstruct state = deepcopy(state, memo) File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 150, in deepcopy y = copier(x, memo) File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 240, in _deepcopy_dict y[deepcopy(key, memo)] = deepcopy(value, memo) File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 159, in deepcopy copier = getattr(x, "deepcopy", None) ReferenceError: weakly-referenced object no longer exists 2020-04-15 05:34:12,103 - WARNING - Failed to apply fold_constants Traceback (most recent call last): File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\site-packages\tf2onnx\optimizer__init__.py", line 50, in optimize_graph current = copy.deepcopy(graph) File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 180, in deepcopy y = _reconstruct(x, memo, rv) File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 280, in _reconstruct state = deepcopy(state, memo) File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 150, in deepcopy y = copier(x, memo) File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 240, in _deepcopy_dict y[deepcopy(key, memo)] = deepcopy(value, memo) File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 150, in deepcopy y = copier(x, memo) File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 240, in _deepcopy_dict y[deepcopy(key, memo)] = deepcopy(value, memo) File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 180, in deepcopy y = _reconstruct(x, memo, rv) File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 280, in _reconstruct state = deepcopy(state, memo) File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 150, in deepcopy y = copier(x, memo) File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 220, in _deepcopy_tuple y = [deepcopy(a, memo) for a in x] File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 220, in y = [deepcopy(a, memo) for a in x] File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 150, in deepcopy y = copier(x, memo) File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 240, in _deepcopy_dict y[deepcopy(key, memo)] = deepcopy(value, memo) File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 180, in deepcopy y = _reconstruct(x, memo, rv) File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 280, in _reconstruct state = deepcopy(state, memo) File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 150, in deepcopy y = copier(x, memo) File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 240, in _deepcopy_dict y[deepcopy(key, memo)] = deepcopy(value, memo) File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 159, in deepcopy copier = getattr(x, "deepcopy", None) ReferenceError: weakly-referenced object no longer exists 2020-04-15 05:34:12,215 - WARNING - Failed to apply loop_optimizer Traceback (most recent call last): File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\site-packages\tf2onnx\optimizer__init__.py", line 50, in optimize_graph current = copy.deepcopy(graph) File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 180, in deepcopy y = _reconstruct(x, memo, rv) File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 280, in _reconstruct state = deepcopy(state, memo) File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 150, in deepcopy y = copier(x, memo) File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 240, in _deepcopy_dict y[deepcopy(key, memo)] = deepcopy(value, memo) File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 150, in deepcopy y = copier(x, memo) File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 240, in _deepcopy_dict y[deepcopy(key, memo)] = deepcopy(value, memo) File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 180, in deepcopy y = _reconstruct(x, memo, rv) File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 280, in _reconstruct state = deepcopy(state, memo) File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 150, in deepcopy y = copier(x, memo) File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 220, in _deepcopy_tuple y = [deepcopy(a, memo) for a in x] File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 220, in y = [deepcopy(a, memo) for a in x] File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 150, in deepcopy y = copier(x, memo) File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 240, in _deepcopy_dict y[deepcopy(key, memo)] = deepcopy(value, memo) File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 180, in deepcopy y = _reconstruct(x, memo, *rv) File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 280, in _reconstruct state = deepcopy(state, memo) File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 150, in deepcopy y = copier(x, memo) File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 240, in _deepcopy_dict y[deepcopy(key, memo)] = deepcopy(value, memo) File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 159, in deepcopy copier = getattr(x, "deepcopy", None) ReferenceError: weakly-referenced object no longer exists 2020-04-15 05:34:12,323 - WARNING - Failed to apply merge_duplication Traceback (most recent call last): File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\site-packages\tf2onnx\optimizer__init__.py", line 50, in optimize_graph current = copy.deepcopy(graph) File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 180, in deepcopy y = _reconstruct(x, memo, rv) File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 280, in _reconstruct state = deepcopy(state, memo) File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 150, in deepcopy y = copier(x, memo) File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 240, in _deepcopy_dict y[deepcopy(key, memo)] = deepcopy(value, memo) File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 150, in deepcopy y = copier(x, memo) File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 240, in _deepcopy_dict y[deepcopy(key, memo)] = deepcopy(value, memo) File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 180, in deepcopy y = _reconstruct(x, memo, rv) File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 280, in _reconstruct state = deepcopy(state, memo) File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 150, in deepcopy y = copier(x, memo) File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 220, in _deepcopy_tuple y = [deepcopy(a, memo) for a in x] File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 220, in y = [deepcopy(a, memo) for a in x] File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 150, in deepcopy y = copier(x, memo) File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 240, in _deepcopy_dict y[deepcopy(key, memo)] = deepcopy(value, memo) File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 180, in deepcopy y = _reconstruct(x, memo, *rv) File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 280, in _reconstruct state = deepcopy(state, memo) File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 150, in deepcopy y = copier(x, memo) File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 240, in _deepcopy_dict y[deepcopy(key, memo)] = deepcopy(value, memo) File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 159, in deepcopy copier = getattr(x, "deepcopy", None) ReferenceError: weakly-referenced object no longer exists 2020-04-15 05:34:12,540 - WARNING - Failed to apply remove_identity Traceback (most recent call last): File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\site-packages\tf2onnx\optimizer__init__.py", line 50, in optimize_graph current = copy.deepcopy(graph) File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 180, in deepcopy y = _reconstruct(x, memo, rv) File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 280, in _reconstruct state = deepcopy(state, memo) File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 150, in deepcopy y = copier(x, memo) File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 240, in _deepcopy_dict y[deepcopy(key, memo)] = deepcopy(value, memo) File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 150, in deepcopy y = copier(x, memo) File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 240, in _deepcopy_dict y[deepcopy(key, memo)] = deepcopy(value, memo) File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 180, in deepcopy y = _reconstruct(x, memo, rv) File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 280, in _reconstruct state = deepcopy(state, memo) File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 150, in deepcopy y = copier(x, memo) File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 220, in _deepcopy_tuple y = [deepcopy(a, memo) for a in x] File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 220, in y = [deepcopy(a, memo) for a in x] File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 150, in deepcopy y = copier(x, memo) File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 240, in _deepcopy_dict y[deepcopy(key, memo)] = deepcopy(value, memo) File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 180, in deepcopy y = _reconstruct(x, memo, *rv) File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 280, in _reconstruct state = deepcopy(state, memo) File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 150, in deepcopy y = copier(x, memo) File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 240, in _deepcopy_dict y[deepcopy(key, memo)] = deepcopy(value, memo) File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 159, in deepcopy copier = getattr(x, "deepcopy", None) ReferenceError: weakly-referenced object no longer exists 2020-04-15 05:34:12,649 - WARNING - Failed to apply remove_back_to_back Traceback (most recent call last): File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\site-packages\tf2onnx\optimizer__init__.py", line 50, in optimize_graph current = copy.deepcopy(graph) File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 180, in deepcopy y = _reconstruct(x, memo, rv) File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 280, in _reconstruct state = deepcopy(state, memo) File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 150, in deepcopy y = copier(x, memo) File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 240, in _deepcopy_dict y[deepcopy(key, memo)] = deepcopy(value, memo) File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 150, in deepcopy y = copier(x, memo) File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 240, in _deepcopy_dict y[deepcopy(key, memo)] = deepcopy(value, memo) File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 180, in deepcopy y = _reconstruct(x, memo, rv) File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 280, in _reconstruct state = deepcopy(state, memo) File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 150, in deepcopy y = copier(x, memo) File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 220, in _deepcopy_tuple y = [deepcopy(a, memo) for a in x] File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 220, in y = [deepcopy(a, memo) for a in x] File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 150, in deepcopy y = copier(x, memo) File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 240, in _deepcopy_dict y[deepcopy(key, memo)] = deepcopy(value, memo) File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 180, in deepcopy y = _reconstruct(x, memo, *rv) File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 280, in _reconstruct state = deepcopy(state, memo) File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 150, in deepcopy y = copier(x, memo) File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 240, in _deepcopy_dict y[deepcopy(key, memo)] = deepcopy(value, memo) File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 159, in deepcopy copier = getattr(x, "deepcopy__", None) ReferenceError: weakly-referenced object no longer exists 2020-04-15 05:34:12,689 - INFO - After optimization: no change 2020-04-15 05:34:12,826 - INFO - 2020-04-15 05:34:12,827 - INFO - Successfully converted TensorFlow model C:/Users/tmp/net.pb to ONNX 2020-04-15 05:34:14,159 - INFO - ONNX model is saved at C:/Users/tmp/net.onnx

guschmue commented 4 years ago

tf2onnx-1.5.1 did not have support for opset 11 but we are accepting the --opset 11 so we tag the model with opset 11 (lame excuse - we don't fail because it makes it easier for us when we are in the middle of adding a new opset). We'll discuss changing that to fail when the opset is not fully implemented.

If you upgrade to tf2onnx-1.5.6 (pip install tf2onnx -U) things should work.

ttdd11 commented 4 years ago

@guschmue Did you take a look at the trace for 1.5.6? I can't seem to get for optimizer working for that version either.

guschmue commented 4 years ago

got it. There is another bug like this, something new that we have not been able to reproduce. What python version is this ? Anaconda or system python ?

ttdd11 commented 4 years ago

Anaconda I think version 3.6. What do you recommend I can try some things if that helps.

guschmue commented 4 years ago

I'm looking for some way of reproducing this but so far anaconda/3.6 is happy on linux on windows.

ttdd11 commented 4 years ago

Would the model help?

On Wed., Apr. 15, 2020, 5:28 p.m. Guenther Schmuelling, < notifications@github.com> wrote:

I'm looking for some way of reproducing this but so far anaconda/3.6 is happy on linux on windows.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/onnx/tensorflow-onnx/issues/886#issuecomment-614289434, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJPVGLP3RRDXHQJBCG7YVULRMYRI7ANCNFSM4MIRNQYQ .

guschmue commented 4 years ago

Sure, since it is failing for you I hope it would fail for me.

ttdd11 commented 4 years ago

Do you have an email I can send this to? I probably shouldn't post the model here.

ttdd11 commented 4 years ago

@guschmue I tried many variants of onnx, tensorflow and tf2onnx and all version are unhappy with opset 11.

I also built the master branch and tried that, same issues regardless of the tensorflow version.

guschmue commented 4 years ago

This deepcopy issue seems unrelated to the opset, something wrong in the optimizer. There are 2 other deepcopy issues that came in recently and they all look the same, failing in the back_to_back optimizer making me think that the identity optimizer that runs before back_to_back must have some issue. Going to review that code. If you can share the model you can send a link to guschmue@microsoft.com.

ttdd11 commented 4 years ago

Just sent, thanks for taking a look.

guschmue commented 4 years ago

So I tried your model on tf-1.14, tf-2.2, tf2onnx-1.5.6 and tf2onnx-master with python3.6 and python3.7 on both windows and linux ... all working for me. The only thing in my env that might be different is that I only use anacoda. Let me check with some team mates if they have ever seen the deepcopy error.

ttdd11 commented 4 years ago

I'm not sure what you mean by I only use anaconda? That's also what I am using.

What version of anaconda are you using? I may re-install and send all my instructions. What version of cuda are you using?

guschmue commented 4 years ago

I'm using cuda-10.1 on linux and used a cpu build on windows. Don't think cuda would impact tf2onnx except in a few cases where the graph is a little different if tensorflow finds cuda. Some people use the system python, that is why I mention that I only use anaconda.

ttdd11 commented 4 years ago

I'm going to re-install anaconda and re-build my environments. Can you email me back the .onnx export so I can try it further down? I'm just moving to opset 11 to address some downstream issues.

jignparm commented 4 years ago

@ttdd11, can you run the packages without creating a conda environment to see if that makes any difference?

ttdd11 commented 4 years ago

@jignparm as in run the same without calling activate env?

jignparm commented 4 years ago

Yes -- BTW, this is not likely to be the issue, just trying to isolate the differences.

ttdd11 commented 4 years ago

@jignparm This is a bit tricky, my anaconda environment isn't happy without calling an activate. I'll have to change some path variables to test this out.

ttdd11 commented 4 years ago

@jignparm would it be just as good of a test if I ran this using system python and packages?

jignparm commented 4 years ago

my anaconda environment isn't happy without calling an activate.

That's odd. You should be able to install Anaconda multiple times in separate folders (i.e. have a secondary installations).

system python and packages?

Not sure what configuration the system Python is. Like I mentioned above, this is not likely to be the root cause (simply a difference), so if it ends up being too difficult to test, feel free to skip it (I assumed it would be a quick test, and hence proposed it).

Other users have seen similar errors, but so far we have not been able to reproduce them, which was the reason for the far-fetched test, to rule out Python environment issues.

ttdd11 commented 4 years ago

It's a known issue with numpy and anaconda. I'll see how the afternoon plays out (it's just a path issue that's pretty easily resolved).

Just a thought, are you guys building onnx from source?

jignparm commented 4 years ago

The Onnx package is not built from source. It's the released version from Pypi.

buddhapuneeth commented 4 years ago

@jignparm my issue is also linked to this. So I am commenting here. I am able to narrow down the issue. For me issue is happening only with CPython 3.6 (internal version) not with the anaconda Python 3.6. I compared copy.py files in both, there is no difference. Not sure of the exact point of failure. Is there any alternate mechanisms you can think of for deepcopy()?

jignparm commented 4 years ago

@buddhapuneeth, it looks like Anaconda 3.6 works for you, but CPython 3.6 throws this error.

For @ttdd11 , Anaconda is throwing an error as well.

I installed CPython 3.6 on windows and converted a large ssd_resnet101_v1_fpn_shared_box_predictor_oid_512x512_sync_2019_01_20 model, but still could not reproduce the deepcopy error.

Even if we use an alternative to deepcopy, it will be difficult to verify without being able to reproduce the error.

I'll investigate to see if there are any dangling/bad references after conversion.

Could you disable the optimizers (need to modify code) to isolate it to a particular optimizer? One suspicion is the back_to_back optimizer in optimizer\__init__.py. If you remove it from the dictionary in that file, it'll disable it.

buddhapuneeth commented 4 years ago

I removed BackToBackOptimizer and tried, still the same issue for other optimizers.

jignparm commented 4 years ago

Thanks for the quick check! Any idea if disabling all optimizers still results in this issue?

buddhapuneeth commented 4 years ago

If I disable all optimizers, then there is no issue. One basic doubt here, we are copying the graph here as a fallback mechanism in case optimization fails...right? So I should not assume optimizations will be successful every time?

jignparm commented 4 years ago

@buddhapuneeth , yes that's correct -- if an optimizer fails, then the 'current' graph will not be updated to the 'new' graph (i.e. the one optimizer will be modifying until it succeeds or hits an error and exits). So optimizers are not required to succeed at every iteration -- if any of the throws an error, only that optimizer is aborted.

It's interesting that disabling all the optimizers solves the issue. It means most likely one of them is a culprit, and probably not the the back_to_back_optimizer.

IdentityOptimizer is another suspicious source of the deepcopy error. If you enable only only that one optimizer, and if you observe the deepcopy error, it should be a good enough hint for us to look for a fix for it.

buddhapuneeth commented 4 years ago

@jignparm I tried with all combinations and it is failing in all. Also it is failing for one specific model, I tried with other models for which it is working fine. I am assuming, some nodetype is causing issue while deep copy while resolving references as the final error says: 'ReferenceError: weakly-referenced object no longer exists'

As mentioned in the first ticket, I am using a temp' workaround: logger.verbose("Apply %s", name) try: current = copy.deepcopy(graph) except Exception: logger.verbose("Failed to do deepcopy") current = graph opt = factory() graph = opt.optimize(current) or graph

These are the logs for transpose optimization. I am having some 400 nodes of ~30 types. 2020-04-28 16:45:58,627 - VERBOSE - tf2onnx.optimizer: Apply optimize_transpose 2020-04-28 16:45:58,651 - DEBUG - tf2onnx.optimizer: Failed to do deepcopy 2020-04-28 16:45:58,744 - VERBOSE - tf2onnx.optimizer.TransposeOptimizer: Add -3 (..), Const -36 (..), Identity -3 (..), Reshape +1 (..), Transpose -16 (..) 2020-04-28 16:45:58,744 - VERBOSE - tf2onnx.optimizer: Apply fold_constants..........

I am not able to narrow down the culprit node, as there are lot of them. Any suggestions from your side?

buddhapuneeth commented 4 years ago

@jignparm issue is resolved for me on upgrading to cpython37 from cpython36.

jignparm commented 4 years ago

Thanks @buddhapuneeth for trying out several combinations, and great that cpython37 is working for you without any errors.

It might still be helpful to debug on 3.6 to see if there's a true bug in any of the optimizers.

Each optimizer runs independently from the others, but it's possible that the graph is corrupted by optimizer A and we don't see the error until optimizer B runs.

There are 6 of them activated (see below), and since the dictionary is sorted by key values, they will run in alphabetical order until no more optimizations can be performed.

It should be possible to comment out all 6 of them, which makes the error disappear, and then activate only 1 of them at a time, to see which optimizer is causing the error. There is still likely to be a subtle bug in these, and it's less likely that deep-copy is buggy -- so isolating it down to 1 optimizer would be very good information.

https://github.com/onnx/tensorflow-onnx/blob/master/tf2onnx/optimizer/__init__.py#L22-L29

buddhapuneeth commented 4 years ago

Hi @jignparm, as I mentioned in my second last comment. I tried all the combinations of optimizers. If I comment all the optimizers, there is no error. And with any of the optimizer the error is happening. I also tried to find the exact node at which deepcopy is failing, but due to ~400 nodes running recursively, I was not able to narrow it down. I believe, it is something to do with nodes in the graph and nothing to do with optimizers.

jignparm commented 4 years ago

Thanks for the clarifying -- I didn't realize that <1> with ALL optimizers disabled, there's no error, and <2> with ANY one optimizer enabled, you see this error.

I believe, it is something to do with nodes in the graph and nothing to do with optimizers.

That sounds reasonable to me as well. If some node is corrupted before starting any of the optimizers -- that would explain <2> above.

In that case, a good way to isolate the exact node (or rewriter) is put a debug line just below the 'try' at 2 locations:

Something like the snippet below. The print statement just before the failure is probably the operator that is causing the corruption for your model.

try:
   print (func)                # print what op or rewriter is being called....
   dontuse = deepcopy(g)       # if deepcopy fails, then the previous func() caused corruption
   ...
except ... :
jignparm commented 4 years ago

@ttdd, @buddhapuneeth PR #972 should resolve this issue. Let me know if you see any errors. It took a while before we got a model to reproduce the error systematically -- hopefully this resolves it finally.

guschmue commented 4 years ago

I assume this is fixed.