onnx / tensorflow-onnx

Convert TensorFlow, Keras, Tensorflow.js and Tflite models to ONNX
Apache License 2.0
2.3k stars 432 forks source link

Unable to convert to ONXX using Frozen.pb file and also using --checkpoint. Please Help [Urgent] #1981

Open rabaig opened 2 years ago

rabaig commented 2 years ago

Hello,

I am using the latest branch of tensorflow-onnx. I got struck and I need help and I am working on critical project.

My DeepLearning model generates checkpoint files and I tried to use the below command.

python -m tf2onnx.convert --checkpoint ./OCR_240000model.ckpt.meta --inputs input_img:0 --outputs DetResults:0 --output model.onnx

I got the error : AssertionError: DetResults is not in graph

Below is the detailed log /home/jungyoonkim/anaconda3/envs/tensor/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:526: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. _np_qint8 = np.dtype([("qint8", np.int8, 1)]) /home/jungyoonkim/anaconda3/envs/tensor/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:527: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. _np_quint8 = np.dtype([("quint8", np.uint8, 1)]) /home/jungyoonkim/anaconda3/envs/tensor/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:528: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. _np_qint16 = np.dtype([("qint16", np.int16, 1)]) /home/jungyoonkim/anaconda3/envs/tensor/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:529: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. _np_quint16 = np.dtype([("quint16", np.uint16, 1)]) /home/jungyoonkim/anaconda3/envs/tensor/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:530: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. _np_qint32 = np.dtype([("qint32", np.int32, 1)]) /home/jungyoonkim/anaconda3/envs/tensor/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:535: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. np_resource = np.dtype([("resource", np.ubyte, 1)]) /home/jungyoonkim/anaconda3/envs/tensor/lib/python3.7/runpy.py:125: RuntimeWarning: 'tf2onnx.convert' found in sys.modules after import of package 'tf2onnx', but prior to execution of 'tf2onnx.convert'; this may result in unpredictable behaviour warn(RuntimeWarning(msg)) Traceback (most recent call last): File "/home/jungyoonkim/anaconda3/envs/tensor/lib/python3.7/runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "/home/jungyoonkim/anaconda3/envs/tensor/lib/python3.7/runpy.py", line 85, in _run_code exec(code, run_globals) File "/home/jungyoonkim/anaconda3/envs/tensor/lib/python3.7/site-packages/tf2onnx/convert.py", line 692, in main() File "/home/jungyoonkim/anaconda3/envs/tensor/lib/python3.7/site-packages/tf2onnx/convert.py", line 232, in main graph_def, inputs, outputs = tf_loader.from_checkpoint(args.checkpoint, args.inputs, args.outputs) File "/home/jungyoonkim/anaconda3/envs/tensor/lib/python3.7/site-packages/tf2onnx/tf_loader.py", line 383, in from_checkpoint frozen_graph = freeze_session(sess, input_names=input_names, output_names=output_names) File "/home/jungyoonkim/anaconda3/envs/tensor/lib/python3.7/site-packages/tf2onnx/tf_loader.py", line 305, in freeze_session graph_def = convert_variables_to_constants(sess, graph_def, output_node_names) File "/home/jungyoonkim/anaconda3/envs/tensor/lib/python3.7/site-packages/tensorflow/python/util/deprecation.py", line 324, in new_func return func(*args, *kwargs) File "/home/jungyoonkim/anaconda3/envs/tensor/lib/python3.7/site-packages/tensorflow/python/framework/graph_util_impl.py", line 245, in convert_variables_to_constants inference_graph = extract_sub_graph(input_graph_def, output_node_names) File "/home/jungyoonkim/anaconda3/envs/tensor/lib/python3.7/site-packages/tensorflow/python/util/deprecation.py", line 324, in new_func return func(args, **kwargs) File "/home/jungyoonkim/anaconda3/envs/tensor/lib/python3.7/site-packages/tensorflow/python/framework/graph_util_impl.py", line 181, in extract_sub_graph _assert_nodes_are_present(name_to_node, dest_nodes) File "/home/jungyoonkim/anaconda3/envs/tensor/lib/python3.7/site-packages/tensorflow/python/framework/graph_util_impl.py", line 137, in _assert_nodes_are_present assert d in name_to_node, "%s is not in graph" % d AssertionError: DetResults is not in graph

I convert my checkpoints files to frozen.pb file and later I used the below command

python -m tf2onnx.convert --input FPN_Res152D_OCR_20222406_v1_Frozen.pb --inputs input_img:0 --outputs DetResults:0 --opset 11 --output model.onnx

I got the below error:

2022-06-26 21:46:34,344 - ERROR - Tensorflow op [cond/mul/Switch: Switch] is not supported 2022-06-26 21:46:34,344 - ERROR - Tensorflow op [cond/floordiv/Switch: Switch] is not supported 2022-06-26 21:46:34,344 - ERROR - Tensorflow op [cond/cond/Merge: Merge] is not supported 2022-06-26 21:46:35,788 - ERROR - Tensorflow op [postprocess_fastrcnn/PyFunc_32: PyFunc] is not supported

Though model get converted to ONXX with errors and internal optimizations but need help to resolve this issues.

Awaiting for your acknowledgment.

fatcat-z commented 2 years ago

2022-06-26 21:46:34,344 - ERROR - Tensorflow op [cond/mul/Switch: Switch] is not supported 2022-06-26 21:46:34,344 - ERROR - Tensorflow op [cond/floordiv/Switch: Switch] is not supported 2022-06-26 21:46:34,344 - ERROR - Tensorflow op [cond/cond/Merge: Merge] is not supported 2022-06-26 21:46:35,788 - ERROR - Tensorflow op [postprocess_fastrcnn/PyFunc_32: PyFunc] is not supported

These ops are not supported by tf2onnx and ONNX Runtime yet so they could not be exported to ONNX successfully.

rabaig commented 2 years ago

Hello,

However it is getting converted to ONXX with the errors. Do you mean this is incomplete ??

And my next target to convert this ONXX model to .trt(tensor runtime).

when I executed with command and I get the the following error.

[06/27/2022-17:53:48] [E] [TRT] ModelImporter.cpp:775: input: "strided_slice_3__348:0" input: "Less:0" output: "cond/mul/Switch:0" output: "cond/mul/Switch:1" name: "cond/mul/Switch" op_type: "Switch"

[06/27/2022-17:53:48] [E] [TRT] ModelImporter.cpp:776: --- End node --- [06/27/2022-17:53:48] [E] [TRT] ModelImporter.cpp:779: ERROR: builtin_op_importers.cpp:4890 In function importFallbackPluginImporter: [8] Assertion failed: creator && "Plugin not found, are the plugin name, version, and namespace correct?"

Is this error is because of the incomplete ONXX conversion??

Could you suggest any workarounds... Thanks

fatcat-z commented 2 years ago

If tf2onnx met an unsupported op, it will keep it in the ONNX graph and deliver an error message as you've seen. So it is not an incomplete status.

These ops are relative to tf control flow so we don't have a quick workaround on hands. Is it possible that you can remove such ops from the original model code for the conversion?

rabaig commented 2 years ago

I am trying to remove the unsupported op nodes from my .pb file and my idea is to convert the resultant file to ._frozen.pb using below code

freeze_graph.freeze_graph(input_graph=os.path.join(OUT_DIR, PB_NAME), input_saver='', input_binary=False, input_checkpoint=CKPT_PATH, output_node_names="DetResults", restore_op_name="save/restore_all", filename_tensor_name='save/Const:0', output_graph=os.path.join(OUT_DIR, PB_NAME.replace('.pb', '_Frozen.pb')), clear_devices=False, initializer_nodes='')

but I am getting many issues as this graph nodes are completely dependent on another.

For example I removed below node from the file node { name: "cond/mul/Switch" op: "Switch" input: "strided_slice_3" input: "cond/pred_id" attr { key: "T" value { type: DT_INT32 } } attr { key: "_class" value { list { s: "loc:@strided_slice_3" } } } }

This node is acting as an input to other node and it forms a chain. Can I provide my model.pb to you so that you can have a check please? or can you suggest some other techniques?

Thanks

fatcat-z commented 2 years ago

I am trying to remove the unsupported op nodes from my .pb file and my idea is to convert the resultant file to ._frozen.pb using below code

freeze_graph.freeze_graph(input_graph=os.path.join(OUT_DIR, PB_NAME), input_saver='', input_binary=False, input_checkpoint=CKPT_PATH, output_node_names="DetResults", restore_op_name="save/restore_all", filename_tensor_name='save/Const:0', output_graph=os.path.join(OUT_DIR, PB_NAME.replace('.pb', '_Frozen.pb')), clear_devices=False, initializer_nodes='')

but I am getting many issues as this graph nodes are completely dependent on another.

For example I removed below node from the file node { name: "cond/mul/Switch" op: "Switch" input: "strided_slice_3" input: "cond/pred_id" attr { key: "T" value { type: DT_INT32 } } attr { key: "_class" value { list { s: "loc:@strided_slice_3" } } } }

This node is acting as an input to other node and it forms a chain. Can I provide my model.pb to you so that you can have a check please? or can you suggest some other techniques?

Thanks

The solution of removing unsupported ops from the final ph file doesn't work, because you are going to loose some functions of this model by this way.

The possible solution is to go back to your original model python code and find out all unsupported ops. Replace each unsupported op with other op or op combination.

After all unsupported ops were processed by the above way, try to call tf2onnx to convert it to ONNX. It might be successfully.