onnx / onnx-tensorflow

Tensorflow Backend for ONNX
Other
1.28k stars 297 forks source link

Conversion from tensorflow checkpoint to onnx fails due to accuracy tensor not initalized. #344

Open JoeyCarson opened 5 years ago

JoeyCarson commented 5 years ago

Describe the bug

TF to ONNX conversion fails because tf.metrics.accuracy tensor node is not initialized. onnx-tf convert -t onnx -i /path/to/input.ckpt -o /path/to/output.onnx

tensorflow.python.framework.errors_impl.FailedPreconditionError: Attempting to use uninitialized value accuracy/count [[Node: accuracy/count/_306 = _SendT=DT_FLOAT, client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_312_accuracy/count", _device="/job:localhost/replica:0/task:0/device:GPU:0"]] [[Node: dense/kernel/Momentum/_315 = _Recv[_start_time=0, client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_320_dense/kernel/Momentum", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "C:\Users\jocarson\AppData\Local\Continuum\Anaconda3\Scripts\onnx-tf-script.py", line 9, in load_entry_point('onnx-tf', 'console_scripts', 'onnx-tf')() File "c:\users\jocarson\desktop\proj\onnx-tensorflow\onnx_tf\cli.py", line 19, in main return onnx_tf.converter.main(args[1:]) File "c:\users\jocarson\desktop\proj\onnx-tensorflow\onnx_tf\converter.py", line 23, in main convert(**{k: v for k, v in vars(args).items() if v is not None}) File "c:\users\jocarson\desktop\proj\onnx-tensorflow\onnx_tf\converter.py", line 216, in convert initializer_nodes="") File "c:\users\jocarson\appdata\local\continuum\anaconda3\lib\site-packages\tensorflow\python\tools\freeze_graph.py", line 254, in freeze_graph checkpoint_version=checkpoint_version) File "c:\users\jocarson\appdata\local\continuum\anaconda3\lib\site-packages\tensorflow\python\tools\freeze_graph.py", line 153, in freeze_graph_with_def_protos variable_names_blacklist=variable_names_blacklist) File "c:\users\jocarson\appdata\local\continuum\anaconda3\lib\site-packages\tensorflow\python\framework\graph_util_impl.py", line 251, in convert_variables_to_constants returned_variables = sess.run(variable_names) File "c:\users\jocarson\appdata\local\continuum\anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 900, in run run_metadata_ptr) File "c:\users\jocarson\appdata\local\continuum\anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1135, in _run feed_dict_tensor, options, run_metadata) File "c:\users\jocarson\appdata\local\continuum\anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1316, in _do_run run_metadata) File "c:\users\jocarson\appdata\local\continuum\anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1335, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.FailedPreconditionError: Attempting to use uninitialized value accuracy/count [[Node: accuracy/count/_306 = _SendT=DT_FLOAT, client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_312_accuracy/count", _device="/job:localhost/replica:0/task:0/device:GPU:0"]] [[Node: dense/kernel/Momentum/_315 = _Recv[_start_time=0, client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_320_dense/kernel/Momentum", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

A clear and concise description of what the bug is. My tensorflow model is essentially the one from tensorflow/models/resnet github repo. This is a tf.estimator.Estimator, so no calls to create a session are made. The session is generated by calling Estimator.train(). https://github.com/tensorflow/models/tree/master/official/resnet

To Reproduce onnx-tf convert -t onnx -i /path/to/input.ckpt -o /path/to/output.onnx

Please give us instructions to reproduce your problem. That's essentially it. There is an accuracy tensor that is logged during training. It seems the converter does not like that it's not initialized.

A self-contained piece of code that can demonstrate the problem is required.

Please do not expect us to have PyTorch, Caffe2 installed.

If a model exported from PyTorch and Caffe2 is having trouble in ONNX-TF, use the next section to attach the model.

ONNX model file

If applicable, attach the onnx model file in question using Gist, DropBox or Google Drive.

Python, ONNX, ONNX-TF, Tensorflow version

This section can be obtained by running get_version.py from util folder.

Additional context

Add any other context about the problem here.

fumihwh commented 5 years ago

@JoeyCarson Try add tf.metrics.accuracy tensor name to https://github.com/onnx/onnx-tensorflow/blob/master/onnx_tf/converter.py#L216 .

JoeyCarson commented 5 years ago

Hi, I added the name of the accuracy tensor to the line you mentioned, but got the following error from it.

ValueError: Fetch argument 'accuracy' cannot be interpreted as a Tensor. ("The name 'accuracy' refers to an Operation not in the graph.")

fumihwh commented 5 years ago

@JoeyCarson You mean your tf.metrics.accuracy name is accuracy? If i guess right, you pass name="accuracy" to tf.metrics.accuracy, right?

tf.metrics.accuracy(
    labels,
    predictions,
    weights=None,
    metrics_collections=None,
    updates_collections=None,
    name=None
)