onnx / onnx-tensorflow

Tensorflow Backend for ONNX
Other
1.28k stars 297 forks source link

Can not use converted ONNX -> TF graph independently [py_func issue] #167

Open nmakhotkin opened 6 years ago

nmakhotkin commented 6 years ago

I am trying to export some ONNX model to Tensorflow and then use it for inference (possibly on another environment). Here is an example of exporting MNIST model:

import numpy as np
import onnx
from onnx_tf.backend import prepare
import tensorflow as tf

print('loading onnx model')
onnx_model = onnx.load('train/model.onnx')

print('prepare tf model')
tf_rep = prepare(onnx_model)
print(tf_rep.predict_net)
print('-----')
print(tf_rep.predict_net.tensor_dict)

test = np.random.rand(1, 1, 28, 28)

out = tf_rep.run(test)._0
print(out)

with tf.Session() as persisted_sess:
    print("load graph")
    persisted_sess.graph.as_default()
    tf.import_graph_def(tf_rep.predict_net.graph.as_graph_def(), name='')
    # for op in persisted_sess.graph.get_operations():
    #    print(op)
    inp = persisted_sess.graph.get_tensor_by_name(
        tf_rep.predict_net.tensor_dict[tf_rep.predict_net.external_input[0]].name
    )
    out = persisted_sess.graph.get_tensor_by_name(
        tf_rep.predict_net.tensor_dict[tf_rep.predict_net.external_output[0]].name
    )
    res = persisted_sess.run(out, {inp: test})
    print(res)

tf_rep.export_graph('train/tf.pb')

The script above executes successfully and the prediction also runs successfully (res == out here). Now, I am importing the saved model in TF:

import numpy as np
import tensorflow as tf
from tensorflow.python.platform import gfile

name = "train/tf.pb"

with tf.Session() as persisted_sess:
    print("load graph")
    with gfile.FastGFile(name, 'rb') as f:
        graph_def = tf.GraphDef()
        graph_def.ParseFromString(f.read())

    persisted_sess.graph.as_default()
    tf.import_graph_def(graph_def, name='')

    test = np.random.rand(1, 1, 28, 28).astype(np.float32)

    inp = persisted_sess.graph.get_tensor_by_name('0:0')
    out = persisted_sess.graph.get_tensor_by_name('LogSoftmax:0')
    feed_dict = {inp: test}

    classification = persisted_sess.run(out, feed_dict)

And now I got the error related to nonexistent PyFuncs:

2018-05-14 15:28:36.780899: W tensorflow/core/framework/op_kernel.cc:1198] tensorflow.python.framework.errors_impl.UnknownError: exceptions.KeyError: 'pyfunc_0'
         [[Node: PyFunc = PyFunc[Tin=[DT_FLOAT, DT_INT32, DT_INT32, DT_INT32, DT_INT32, DT_INT32, DT_STRING], Tout=[DT_FLOAT], token="pyfunc_0", _device="/job:localhost/replica:0/task:0/device:CPU:0"](transpose_2, PyFunc/input_1, PyFunc/input_1, PyFunc/input_3, PyFunc/input_4, PyFunc/input_5, PyFunc/input_6)]]

Full log: https://pastebin.com/0bQeMPTG But at the moment of exporting the model worked fine (see above). I did some investigation on what exact functions are being used in the TF graph:

(Pdb) from tensorflow.python.ops import script_ops
(Pdb) script_ops._py_funcs._funcs
{'pyfunc_0': <function py_pool at 0x7f395db05500>, 'pyfunc_1': <function py_pool at 0x7f395dab99b0>}
(Pdb) funcs = script_ops._py_funcs._funcs.values()
(Pdb) func = funcs[0]
(Pdb) func.func_name
'py_pool'
(Pdb) func.func_code
<code object py_pool at 0x7f395fb9deb0, file "/usr/local/lib/python2.7/dist-packages/onnx_tf/backends/backend_v1.py", line 94>
(Pdb)

So, I have a question: is this intended that TF graph uses external function from onnx_tf package? Or this is simply a bug? Is there any way to make this model independent of onnx and onnx-tf packages?

fumihwh commented 6 years ago

I guess there are some pooling ops in your onnx pb. You could take a look of them. If they satisfy one of following conditions,

we use py_func to do _compatibility_pool because in tensorflow, there is no corresponding pool op.

We didn't consider the situation that user will want to do such you did. onnx -> tensorflow -> tensorflow

nmakhotkin commented 6 years ago

Basically, this model is imported from PyTorch, the full net class is below:

import torch
import torch.nn as nn
import torch.nn.functional as F

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(1, 10, kernel_size=5)
        self.conv2 = nn.Conv2d(10, 20, kernel_size=5)
        self.conv2_drop = nn.Dropout2d()
        self.fc1 = nn.Linear(320, 50)
        self.fc2 = nn.Linear(50, 10)

    def forward(self, x):
        x = F.relu(F.max_pool2d(self.conv1(x), 2))
        x = F.relu(F.max_pool2d(self.conv2_drop(self.conv2(x)), 2))
        x = x.view(-1, 320)
        x = F.relu(self.fc1(x))
        x = F.dropout(x, training=self.training)
        x = self.fc2(x)
        return F.log_softmax(x, dim=1)

So I do pytorch -> ONNX -> tensorflow and then try to do inference on tensorflow (the goal is to run tensorflow serving as the result) Btw, converted ONNX model works fine, moreover, converted pytorch -> onnx -> caffe2 model works fine. The problem is only for tensorflow.

nmakhotkin commented 6 years ago

So I suppose these F.max_pool2d operations are converted to py_func

tjingrant commented 6 years ago

@fumihwh we did consider onnx -> tensorflow -> tf serving path, that is why we have export_graph in our API,

@nmakhotkin unfortunately as @fumihwh pointed out, max_pool is a very complicated issue and we strive to strike a balance between logical clarity/conciseness, numerical precision, the need to pass all ONNX backend test and performance. The fix in your case might be simple since you are not padding your feature maps (thus "VALID" padding in TF terms), but please do allow us some time to come up with a more systematic fix.

@fumihwh this essentially boils down to the issue I raised to you on this PR (https://github.com/onnx/onnx-tensorflow/pull/83). Specifically and I quote:

And we should avoid using python function as much as possible because that would prevent us from serializing the graph (thus we can't pass the generated graph to tf_serving).

I think we should revert part of that PR to use native max pooling as much as possible. Your solution was better to reason and more concise, but my original implementation was there for a very practical reason.

tjingrant commented 6 years ago

@nmakhotkin can you provide me with the onnx model generated by torch?

nmakhotkin commented 6 years ago

@tjingrant yes, here is it (uploaded to GDrive): onnx model (generated by pytorch) - https://drive.google.com/file/d/13yJYYgQiiqxP8Khm-PZ5Q6JwxLi2w_4A/view?usp=sharing

original pytorch model - https://drive.google.com/file/d/11BJOI5ucsSmM-9aZBYVIBcDvf9ILihnU/view?usp=sharing

tjingrant commented 6 years ago

@nmakhotkin would you like to try again with this PR https://github.com/onnx/onnx-tensorflow/pull/171/files ?

You can check out a different branch as well (https://github.com/onnx/onnx-tensorflow/tree/fix-pool).

nmakhotkin commented 6 years ago

@tjingrant thanks! I'll try today (it is morning for me now) and will write the results here.

nmakhotkin commented 6 years ago

@tjingrant The fix works! I just tested onnx-tf on fix-pool branch: I converted my onnx model to tensorflow again and model inference works! Now it is able to successfully recognize some examples from MNIST:

$ python tf_inference.py 
/usr/local/lib/python2.7/dist-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
2018-05-15 11:49:27.417070: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA

Prediction of file 5.png: 6
Prediction of file 2.png: 2
Prediction of file 9.png: 9
Prediction of file 1.png: 1
Prediction of file 4.png: 4
Prediction of file 0.png: 0
Prediction of file 7.png: 7

Now there are no PyFunc ops in the graph. Full set of ops is below:

(Pdb) set([op.type for op in persisted_sess.graph.get_operations()])
set([u'MatMul', u'NoOp', u'LogSoftmax', u'Const', u'Sub', u'ExpandDims', u'Reshape', u'MaxPool', u'Transpose', u'Rank', u'Relu', u'Add', u'Identity', u'Pad', u'Split', u'Range', u'Mul', u'Pack', u'Placeholder', u'Conv2D', u'StridedSlice'])

P.S. now waiting when the PR is merged :)

kartk commented 6 years ago

Still getting the Pyfunc error even when using the fix-pool branch.

The model I converted from is Pytorch's Resnet

I've been doing the same thing as @nmakhotkin is trying to do : Pytoch -> Onnx -> Tensorflow representation and then to pb file for running inference.

I was able to convert the mnsit example code from pytorch to a pb file but could not do the same for the resnet model

fumihwh commented 6 years ago

@kartk Could you upload your onnx pb?

kartk commented 6 years ago

The model I use is a slight modified Resnet called Hopenet.

Here is the IR representation : https://drive.google.com/file/d/1VRCHFq7lAIhQFEZYr2o0Ij-1xKjbgIf6/view?usp=sharing

Here is the Converted pb : https://drive.google.com/file/d/1PK45MwNDXPg-tTMe-M0errojUnVSMAXb/view?usp=sharing

fumihwh commented 6 years ago

@kartk You should get an warning message says

UserWarning: Using the pooling op in compatibility mode.This means your graph cannot be serialized.
Please configure your pooling operation to only use paddings that correspond to Tensorflow SAME or VALID padding.

One layer in your network can not use native tensorflow op. We have to use compatible pool. I checked and it seems following layer:

input [1, 64, 112, 112]
pads [1, 1, 1, 1]
output [56, 56]
kernel [3, 3]
strides [2, 2]

If you want to use pool with "SAME" in tensorflow, the pads should be [0, 1, 0, 1].

kartk commented 6 years ago

thanks @fumihwh.

I'm very new to pytorch and NN as a whole, where do i need to change the pads so that it'll be compatible with tensorflow ?

nmakhotkin commented 6 years ago

Just tried to convert pretrained ResNet (resnet101) model to onnx, then to tensorflow. As @kartk said, there is still py_func presented in graph.

Is there a way to get rid of it completely somehow?

tjingrant commented 6 years ago

@nmakhotkin to put it shortly, PyTorch's ResNet implementation is incorrect or more precisely, not faithful to the original paper. This might be unbelievable to you, but let me point you to another discussion thread where we discussed extensively about this topic (https://github.com/tensorflow/benchmarks/issues/134).

And let me quote the relevant part, for the first max-pooling layer in ResNet, here's what paddings are added in various frameworks:

Pytorch: Left 1, right 1. In this case this is equivalent to Left 1, right 0.
Caffe: Left 0, right 1.
TensorFlow SAME: Left 0, right 1.

This stems from the fact that PyTorch only supports symmetric pads. It is not a problem caused by onnx-tensorflow or Tensorflow per se, but rather an unfortunate consequence of the limitation of PyTorch.

tjingrant commented 6 years ago

@nmakhotkin as a result, there is no semantic preserving AND serializable workaround. But we can try to give you an option to slightly alter the semantics of max pool so that you can serialize the incorrect version of ResNet exported from PyTorch; but expect some accuracy degradation of your model as a result.

nmakhotkin commented 6 years ago

Thanks for the answer! Yes, it would be nice to have an additional option flag which will control this behavior (either to export precisely or not).

inakinavarro commented 6 years ago

@tjingrant are you planning to implement this workarund for the serialization of the PyTorch ResNet? It would be great! Thanks

tjingrant commented 6 years ago

Hi, absolutely, but we might have other priorities in the meantime, like supporting onnx v1.2; sorry for the delay, my estimate is that it'll be there before the end of next Wed.

inakinavarro commented 6 years ago

That sounds great!! Thanks for your effort!!!

tjingrant commented 6 years ago

@inakinavarro @nmakhotkin hi, a tentative PR to address this issue has been created https://github.com/onnx/onnx-tensorflow/pull/212.

@inakinavarro I've modified ur original script to use non-strict mode:

tf_backend.prepare(model, strict=False)

It seems to work now. Let me know if anything breaks and I'll follow up.

inakinavarro commented 6 years ago

@tjingrant Great!! Thanks a lot. I will test it ASAP and let you know.

asarah-github commented 6 years ago

After installing onnx 1.2.2 and converting the ResNet-50 model from https://github.com/onnx/models/tree/master/resnet50 to a TF pb file using tf_backend.prepare(model, strict=False), I tried to run the converted model and got KeyError: 'pyfunc_0' error for the pool1_1 layer.

My understanding was that specifying strict=False may cause the network output to change since the semantics may change but that the network could be run (per PR #212). Has this change not been merged into v1.2.2?

tjingrant commented 6 years ago

@asarah-github the PR has not made its way into any of our existing releases yet. It won't be there if you install a release version of onnx-tensorflow (I'm not sure if you have, or were you confusing onnx with onnx-tf). But anyhow, Can you do a master build of onnx-tensorflow and try again?

asarah-github commented 6 years ago

@tjingrant Sorry for the confusion on the version. Anyway, I built from master and ran again. Now the conversion fails with the following error.

...
  File "./lib/python3.5/site-packages/onnx_tf/backend.py", line 76, in prepare
    return cls.onnx_model_to_tensorflow_rep(model, strict)
  File "./lib/python3.5/site-packages/onnx_tf/backend.py", line 87, in onnx_model_to_tensorflow_rep
    return cls._onnx_graph_to_tensorflow_rep(model.graph, model.opset_import, strict)
  File "./lib/python3.5/site-packages/onnx_tf/backend.py", line 141, in _onnx_graph_to_tensorflow_rep
    onnx_node, tensor_dict, handlers, opset=opset, strict=strict)
  File "./lib/python3.5/site-packages/onnx_tf/backend.py", line 236, in _onnx_node_to_tensorflow_op
    return handler.handle(node, tensor_dict=tensor_dict, strict=strict)
  File "./lib/python3.5/site-packages/onnx_tf/handlers/handler.py", line 59, in handle
    return ver_handle(node, **kwargs)
  File "./lib/python3.5/site-packages/onnx_tf/handlers/backend/average_pool.py", line 17, in version_1
    kwargs.get("strict", True))
  File "./lib/python3.5/site-packages/onnx_tf/handlers/backend/pool_mixin.py", line 68, in pool
    x = PadMixin.get_padding_as_op(x, pads)
  File "./lib/python3.5/site-packages/onnx_tf/handlers/backend/pad_mixin.py", line 9, in get_padding_as_op
    num_dim = int(len(pads) / 2)
TypeError: object of type 'NoneType' has no len()

Any ideas?

achalshah20 commented 6 years ago

I am also getting similar error when I go from torch to onnx to tensorflow.

ValueError: callback pyfunc_0 is not found

 [[Node: prefix/PyFunc = PyFunc[Tin=[DT_FLOAT, DT_INT32, DT_INT32, DT_INT32, DT_INT32, DT_INT32, DT_INT32, DT_STRING], Tout=[DT_FLOAT], token="pyfunc_0", _device="/job:localhost/replica:0/task:0/device:CPU:0"](prefix/Relu, prefix/PyFunc/input_1, prefix/PyFunc/input_2, prefix/PyFunc/input_3, prefix/PyFunc/input_4, prefix/PyFunc/input_2, prefix/PyFunc/input_6, prefix/PyFunc/input_7)]]

Console error:

 UserWarning: Using the pooling op in compatibility mode.This means your graph cannot be serialized.Please configure your pooling operation to only use paddings that correspond to Tensorflow SAME or VALID padding.
  "correspond to Tensorflow SAME or VALID padding.", UserWarning)

PB model: https://drive.google.com/open?id=1gp1VF1lafDpxiqIUgVAgWvOeqVxoTAlh

@tjingrant @fumihwh Any ideas?

fumihwh commented 6 years ago

@asarah-github I test master version of resnet50 from https://github.com/onnx/models/tree/master/resnet50 and it works....

fumihwh commented 6 years ago

@achalshah20 As warning says Please configure your pooling operation to only use paddings that correspond to Tensorflow SAME or VALID padding.. For example, in pytorch, if you set [1, 3, 5, 5], kernel [3, 3], pads [1, 1, 1, 1], it corresponds to "SAME" in tf. But if you set pads [2, 2, 2, 2], it doesn't work with default tf func. We should use compatibility mode and calculate pool result by manual. This is exactly what PyFunc is. And PyFunc will be irreversible, means you can not convert this pb to onnx.

asarah-github commented 6 years ago

@fumihwh When you tested the master version of ResNet-50 did you do a master build of onnx-tf?

fumihwh commented 6 years ago

@asarah-github Yes.

nestarz commented 6 years ago

It is now working with tf_backend.prepare(model, strict=False), thanks ! Any downsize to set this flag to False ?

parth1595 commented 6 years ago

@nmakhotkin Is there any generic source file to convert any onnx model to tensorflow model(.pb). I want to convert squezeenet onnx model to tensorflow model. I am new to this one so does any one has idea how can I proceed with this.

parth1595 commented 6 years ago

@achalshah20 @fumihwh @tjingrant @nmakhotkin

I have used the following script for converting onnx model(.onnx) to tensorflow model(.pb) which @nmakhotkin has provided but I am getting error while compiling that maybe because of version issues of onnx version which I found in community. But still I am unable to convert to tensorflow model. I have attached the onnx model file here any help will be appreciated. temp.onnx.zip

nmakhotkin commented 6 years ago

Hi @parth1595

Here is a full script: http://paste.openstack.org/show/730243/ --input argument - input ONNX file, --output argument - output directory which will contain tensorflow model. Just tested it with your ONNX model, everything works fine. The script exports tensorflow-serving-ready model, I just slightly changed it according to the latest changes and a TensorflowRep spec. Have found this using debugger.

parth1595 commented 6 years ago

Hi @nmakhotkin

Thanks for you help I have tested the script you have mentioned and now I am successfully be able to generate the saved_model.pb file(Tensorflow model). But I have just one question here does this one file contains all the meta data and weights. Means I am asking this because I have used this file to directly dump the graph using import_pb_to_tensorboard.py utility which is available in tensorflow. But got some error google.protobuf.message.DecodeError: Error parsing message.

So I think that either I have to freeze this file but for that meta file is required. Can you help me to clear this confusion I want to see this model file dump in tensorflow utility and would like to view the graph in tensorboard.

One more thing I want to add here that is there is only one file generated from the output of the script and the variable folder is empty one which is generated in output so something is wrong. Also while running the script we can see the print of output tensor,it's shape and it's activation values. So why can't we get the .data and .index file in output in parallel with saved_model.pb

nmakhotkin commented 6 years ago

@parth1595 saved_model.pb contains graph itself and serving signatures: inputs and outputs. For exporting only graph itself (and importing it in tensorboard) you can try export the graph like this:

Export freezed graph:

tf.train.export_meta_graph('tf_graph.pb')

Export TF summary graph (so that it can be readable by tensorboard):

train_writer = tf.summary.FileWriter('logdir')
train_writer.add_graph(persisted_sess.graph)

Then start tensorboard, point it to logdir and you will see the graph. Hope that helps!

parth1595 commented 6 years ago

@nmakhotkin Yeah I have already tried that earlier. My point was that in variable directory there is no .data and .index file generated. So does's it mean that my onnx model has no weights and bias. Because if i want to do inference of this tensorflow model I will need the meta data and values of weights and biases of each layer. But here I am getting only graph data(saved_model.pb) so inference will not be possible without values

Can you please check the model you have converted in that do you get the variables and this .pb as well.

Thanks for your help @nmakhotkin in advance !

nmakhotkin commented 6 years ago

@parth1595 No, frozen graph (.pb) already contains all the weights. And saved_model contains them too (but in slightly different location, in variables directory). I think in our case there are all coeffs/weights are included in saved_model.pb file. I can check if that saved_model.pb is suitable for serving and the serving itself works correctly (i.e inference). Can you provide some example input and example output so I can test it? Otherwise I don't know what to expect :)

parth1595 commented 6 years ago

@nmakhotkin You can take any input and output you think is suitable because I have used the onnx squeezenet model here to convert to tensorflow model. And while running the script which you have provided it has print of input and output tensor you can take that as well I think.

Also if you have some small sample model for onnx then can you give a reference of that to me. Because that also will be fine for me. My goal is to convert any onnx model to tensorflow model properly with graph and values.

nmakhotkin commented 6 years ago

@parth1595 I managed to make an inference (while running serving) only if using strict=False

tf_rep = prepare(onnx_model, strict=False)

Otherwise I get error related to py_func ops in graph for tensorflow which are not defined. So the model works but I don't know whether it is correct.

nmakhotkin commented 6 years ago

I used just random input of dimensions (1, 3, 224, 224)

parth1595 commented 6 years ago

@nmakhotkin Ok Great. So do you have any sample model for onnx because this model is too big and it is hard to verify that it is correct or not.

Also if it is possible for you can we have skype chat because with that we can solve the problem and confusions faster. Hope you won't mind that.

nmakhotkin commented 6 years ago

You can use MNIST model I mentioned here earlier: https://github.com/onnx/onnx-tensorflow/issues/167#issuecomment-388946656

And you can use these samples: mnist-images.zip

parth1595 commented 6 years ago

@nmakhotkin can you provide the script through which you have performed inference.

nmakhotkin commented 6 years ago

I have already compiled tensorflow_model_server binary (see https://www.tensorflow.org/serving/setup#optimized-build)

Then run serving something like:

tensorflow_model_server --model_base_path=model --model_name=model --port=9000

Note: directory model contains dir 1 which in its turn contains saved_model.pb and variables

Then run the model inference like:

import grpc

import numpy as np

from tensorflow.python.framework import tensor_shape
from tensorflow.python.framework import dtypes
from tensorflow.core.framework import tensor_pb2
from tensorflow_serving.apis import prediction_service_pb2_grpc as predict_pb2_grpc
from tensorflow_serving.apis import predict_pb2
from tensorflow_serving.apis import model_pb2

port = 9000
host = 'localhost'

if __name__ == '__main__':
    server = '%s:%s' % (host, port)
    channel = grpc.insecure_channel(server)

    stub = predict_pb2_grpc.PredictionServiceStub(channel)

    test = np.random.rand(1, 3, 224, 224).astype(np.float32)
    shape = test.shape
    tensor_proto = tensor_pb2.TensorProto(
        tensor_shape=tensor_shape.as_shape(shape).as_proto(),
        dtype=dtypes.as_dtype(test.dtype).as_datatype_enum,
    )
    tensor_proto.tensor_content = test.tostring()
    inputs = {'data_0': tensor_proto}

    response = stub.Predict(predict_pb2.PredictRequest(
        inputs=inputs,
        model_spec=model_pb2.ModelSpec(name='model')
    ))

    res = np.fromstring(
        response.outputs.get('Reshape_1').tensor_content,
        dtype=dtypes.as_dtype(response.outputs.get('Reshape_1').dtype).as_numpy_dtype
    )
    print('Received from serving:\n%s' % response)
cinastanbean commented 6 years ago

Hey, I got a way to solve this question. Resnet18 is Right...https://github.com/cinastanbean/pytorch-onnx-tensorflow-pb

ningningG commented 6 years ago

I am coverting a Resnet18 model from mxnet to tensorflow. But there are some problems with the tensorflow pb file generated. When use the .pb file to predict, the error raised:

Traceback (most recent call last):

  File "E:\app_tools\envs\mx\lib\site-packages\tensorflow\python\ops\script_ops.py", line 195, in __call__
    raise ValueError("callback %s is not found" % token)

ValueError: callback pyfunc_0 is not found

Traceback (most recent call last):
  File "E:\app_tools\envs\mx\lib\site-packages\tensorflow\python\client\session.py", line 1278, in _do_call
    return fn(*args)
  File "E:\app_tools\envs\mx\lib\site-packages\tensorflow\python\client\session.py", line 1263, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "E:\app_tools\envs\mx\lib\site-packages\tensorflow\python\client\session.py", line 1350, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: ValueError: callback pyfunc_0 is not found
Traceback (most recent call last):

  File "E:\app_tools\envs\mx\lib\site-packages\tensorflow\python\ops\script_ops.py", line 195, in __call__
    raise ValueError("callback %s is not found" % token)

ValueError: callback pyfunc_0 is not found

     [[Node: PyFunc = PyFunc[Tin=[DT_FLOAT, DT_INT32, DT_INT32, DT_INT32, DT_INT32, DT_INT32, DT_INT32, DT_STRING], Tout=[DT_FLOAT], token="pyfunc_0", _device="/job:localhost/replica:0/task:0/device:CPU:0"](Relu/_3, PyFunc/input_1, PyFunc/input_2, PyFunc/input_3, PyFunc/input_4, PyFunc/input_2, PyFunc/input_6, PyFunc/input_7)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "F:/mx2tf/onnx2tf.py", line 74, in <module>
    res = persisted_sess.run(out, feed_dict)
  File "E:\app_tools\envs\mx\lib\site-packages\tensorflow\python\client\session.py", line 877, in run
    run_metadata_ptr)
  File "E:\app_tools\envs\mx\lib\site-packages\tensorflow\python\client\session.py", line 1100, in _run
    feed_dict_tensor, options, run_metadata)
  File "E:\app_tools\envs\mx\lib\site-packages\tensorflow\python\client\session.py", line 1272, in _do_run
    run_metadata)
  File "E:\app_tools\envs\mx\lib\site-packages\tensorflow\python\client\session.py", line 1291, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: ValueError: callback pyfunc_0 is not found
Traceback (most recent call last):

  File "E:\app_tools\envs\mx\lib\site-packages\tensorflow\python\ops\script_ops.py", line 195, in __call__
    raise ValueError("callback %s is not found" % token)

ValueError: callback pyfunc_0 is not found

     [[Node: PyFunc = PyFunc[Tin=[DT_FLOAT, DT_INT32, DT_INT32, DT_INT32, DT_INT32, DT_INT32, DT_INT32, DT_STRING], Tout=[DT_FLOAT], token="pyfunc_0", _device="/job:localhost/replica:0/task:0/device:CPU:0"](Relu/_3, PyFunc/input_1, PyFunc/input_2, PyFunc/input_3, PyFunc/input_4, PyFunc/input_2, PyFunc/input_6, PyFunc/input_7)]]

Caused by op 'PyFunc', defined at:
  File "F:/mx2tf/onnx2tf.py", line 59, in <module>
    _ = tf.import_graph_def(graph_def, name='')
  File "E:\app_tools\envs\mx\lib\site-packages\tensorflow\python\util\deprecation.py", line 454, in new_func
    return func(*args, **kwargs)
  File "E:\app_tools\envs\mx\lib\site-packages\tensorflow\python\framework\importer.py", line 442, in import_graph_def
    _ProcessNewOps(graph)
  File "E:\app_tools\envs\mx\lib\site-packages\tensorflow\python\framework\importer.py", line 234, in _ProcessNewOps
    for new_op in graph._add_new_tf_operations(compute_devices=False):  # pylint: disable=protected-access
  File "E:\app_tools\envs\mx\lib\site-packages\tensorflow\python\framework\ops.py", line 3289, in _add_new_tf_operations
    for c_op in c_api_util.new_tf_operations(self)
  File "E:\app_tools\envs\mx\lib\site-packages\tensorflow\python\framework\ops.py", line 3289, in <listcomp>
    for c_op in c_api_util.new_tf_operations(self)
  File "E:\app_tools\envs\mx\lib\site-packages\tensorflow\python\framework\ops.py", line 3180, in _create_op_from_tf_operation
    ret = Operation(c_op, self)
  File "E:\app_tools\envs\mx\lib\site-packages\tensorflow\python\framework\ops.py", line 1717, in __init__
    self._traceback = tf_stack.extract_stack()

InvalidArgumentError (see above for traceback): ValueError: callback pyfunc_0 is not found
Traceback (most recent call last):

  File "E:\app_tools\envs\mx\lib\site-packages\tensorflow\python\ops\script_ops.py", line 195, in __call__
    raise ValueError("callback %s is not found" % token)

ValueError: callback pyfunc_0 is not found

     [[Node: PyFunc = PyFunc[Tin=[DT_FLOAT, DT_INT32, DT_INT32, DT_INT32, DT_INT32, DT_INT32, DT_INT32, DT_STRING], Tout=[DT_FLOAT], token="pyfunc_0", _device="/job:localhost/replica:0/task:0/device:CPU:0"](Relu/_3, PyFunc/input_1, PyFunc/input_2, PyFunc/input_3, PyFunc/input_4, PyFunc/input_2, PyFunc/input_6, PyFunc/input_7)]]

Here is the code(onnx->tensorflow):

import numpy as np
import onnx
from onnx_tf.backend import prepare
import tensorflow as tf

model = onnx.load('G:/Dataset/Head_MRI/@model/mx2onnx/sag_cor_all_slice_exported.onnx')
tf_rep = prepare(model, strict=False)
tf_rep.export_graph('G:/Dataset/Head_MRI/@model/mx2onnx/sag_cor_all_slice_tf.pb')

with tf.Session() as persisted_sess:
    print("load graph")

    persisted_sess.graph.as_default()
    tf.import_graph_def(tf_rep.predict_net.graph.as_graph_def(), name='')

    print('input_node_name = ', tf_rep.predict_net.tensor_dict[tf_rep.predict_net.external_input[0]].name)
    print('output_node_name = ', tf_rep.predict_net.tensor_dict[tf_rep.predict_net.external_output[0]].name)

    inp = persisted_sess.graph.get_tensor_by_name(
        tf_rep.predict_net.tensor_dict[tf_rep.predict_net.external_input[0]].name
    )
    out = persisted_sess.graph.get_tensor_by_name(
        tf_rep.predict_net.tensor_dict[tf_rep.predict_net.external_output[0]].name
    )

    raw_data = np.fromfile('G:/Dataset/MRI_Head_20180502/bin_split_256/SAG_256/0417_256_07.raw', dtype=np.float32)
    raw_data.shape = 256, 256
    test = np.random.rand(1, 3, 256, 256).astype(np.float32)
    for i in range(3):
        test[0, i, :, :] = raw_data

    feed_dict = {inp: test}
    res = persisted_sess.run(out, feed_dict)
    print('The predict result is  ', res)
    print("The digit is classified as ",np.argmax(res))

The code above will print the following information:

load graph
input_node_name =  data:0
output_node_name =  Softmax:0
The predict result is  [[0.95922846 0.04077156]]
The digit is classified as  0

But when I load the .pb file to predict using following code, error raised.

graph_path = 'G:/Dataset/Head_MRI/@model/mx2onnx/sag_cor_all_slice_tf.pb'
with tf.Session() as persisted_sess:
        print("load graph")
        with gfile.FastGFile(graph_path, 'rb') as f:
            graph_def = tf.GraphDef()
            graph_def.ParseFromString(f.read())
            persisted_sess.graph.as_default()
            _ = tf.import_graph_def(graph_def, name='')

        persisted_sess.run(tf.global_variables_initializer())

        inp = persisted_sess.graph.get_tensor_by_name('data:0')
        out = persisted_sess.graph.get_tensor_by_name('Softmax:0')

        raw_data = np.fromfile('G:/Dataset/MRI_Head_20180502/bin_split_256/SAG_256/0417_256_05.raw', dtype=np.float32)
        raw_data.shape = 256, 256
        test = np.random.rand(1, 3, 256, 256).astype(np.float32)
        for i in range(3):
            test[0, i, :, :] = raw_data
        feed_dict = {inp: test}

        res = persisted_sess.run(out, feed_dict)
        print(res)
        print("The digit is classified as ", np.argmax(res))

Any ideas?

fumihwh commented 6 years ago

@ningningG Did you try strict=False flag? You can get more info by reading above comments.

ehsab commented 6 years ago

I am coverting a Resnet18 model from mxnet to tensorflow. But there are some problems with the tensorflow pb file generated. When use the .pb file to predict, the error raised:

Traceback (most recent call last):

  File "E:\app_tools\envs\mx\lib\site-packages\tensorflow\python\ops\script_ops.py", line 195, in __call__
    raise ValueError("callback %s is not found" % token)

ValueError: callback pyfunc_0 is not found

Traceback (most recent call last):
  File "E:\app_tools\envs\mx\lib\site-packages\tensorflow\python\client\session.py", line 1278, in _do_call
    return fn(*args)
  File "E:\app_tools\envs\mx\lib\site-packages\tensorflow\python\client\session.py", line 1263, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "E:\app_tools\envs\mx\lib\site-packages\tensorflow\python\client\session.py", line 1350, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: ValueError: callback pyfunc_0 is not found
Traceback (most recent call last):

  File "E:\app_tools\envs\mx\lib\site-packages\tensorflow\python\ops\script_ops.py", line 195, in __call__
    raise ValueError("callback %s is not found" % token)

ValueError: callback pyfunc_0 is not found

   [[Node: PyFunc = PyFunc[Tin=[DT_FLOAT, DT_INT32, DT_INT32, DT_INT32, DT_INT32, DT_INT32, DT_INT32, DT_STRING], Tout=[DT_FLOAT], token="pyfunc_0", _device="/job:localhost/replica:0/task:0/device:CPU:0"](Relu/_3, PyFunc/input_1, PyFunc/input_2, PyFunc/input_3, PyFunc/input_4, PyFunc/input_2, PyFunc/input_6, PyFunc/input_7)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "F:/mx2tf/onnx2tf.py", line 74, in <module>
    res = persisted_sess.run(out, feed_dict)
  File "E:\app_tools\envs\mx\lib\site-packages\tensorflow\python\client\session.py", line 877, in run
    run_metadata_ptr)
  File "E:\app_tools\envs\mx\lib\site-packages\tensorflow\python\client\session.py", line 1100, in _run
    feed_dict_tensor, options, run_metadata)
  File "E:\app_tools\envs\mx\lib\site-packages\tensorflow\python\client\session.py", line 1272, in _do_run
    run_metadata)
  File "E:\app_tools\envs\mx\lib\site-packages\tensorflow\python\client\session.py", line 1291, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: ValueError: callback pyfunc_0 is not found
Traceback (most recent call last):

  File "E:\app_tools\envs\mx\lib\site-packages\tensorflow\python\ops\script_ops.py", line 195, in __call__
    raise ValueError("callback %s is not found" % token)

ValueError: callback pyfunc_0 is not found

   [[Node: PyFunc = PyFunc[Tin=[DT_FLOAT, DT_INT32, DT_INT32, DT_INT32, DT_INT32, DT_INT32, DT_INT32, DT_STRING], Tout=[DT_FLOAT], token="pyfunc_0", _device="/job:localhost/replica:0/task:0/device:CPU:0"](Relu/_3, PyFunc/input_1, PyFunc/input_2, PyFunc/input_3, PyFunc/input_4, PyFunc/input_2, PyFunc/input_6, PyFunc/input_7)]]

Caused by op 'PyFunc', defined at:
  File "F:/mx2tf/onnx2tf.py", line 59, in <module>
    _ = tf.import_graph_def(graph_def, name='')
  File "E:\app_tools\envs\mx\lib\site-packages\tensorflow\python\util\deprecation.py", line 454, in new_func
    return func(*args, **kwargs)
  File "E:\app_tools\envs\mx\lib\site-packages\tensorflow\python\framework\importer.py", line 442, in import_graph_def
    _ProcessNewOps(graph)
  File "E:\app_tools\envs\mx\lib\site-packages\tensorflow\python\framework\importer.py", line 234, in _ProcessNewOps
    for new_op in graph._add_new_tf_operations(compute_devices=False):  # pylint: disable=protected-access
  File "E:\app_tools\envs\mx\lib\site-packages\tensorflow\python\framework\ops.py", line 3289, in _add_new_tf_operations
    for c_op in c_api_util.new_tf_operations(self)
  File "E:\app_tools\envs\mx\lib\site-packages\tensorflow\python\framework\ops.py", line 3289, in <listcomp>
    for c_op in c_api_util.new_tf_operations(self)
  File "E:\app_tools\envs\mx\lib\site-packages\tensorflow\python\framework\ops.py", line 3180, in _create_op_from_tf_operation
    ret = Operation(c_op, self)
  File "E:\app_tools\envs\mx\lib\site-packages\tensorflow\python\framework\ops.py", line 1717, in __init__
    self._traceback = tf_stack.extract_stack()

InvalidArgumentError (see above for traceback): ValueError: callback pyfunc_0 is not found
Traceback (most recent call last):

  File "E:\app_tools\envs\mx\lib\site-packages\tensorflow\python\ops\script_ops.py", line 195, in __call__
    raise ValueError("callback %s is not found" % token)

ValueError: callback pyfunc_0 is not found

   [[Node: PyFunc = PyFunc[Tin=[DT_FLOAT, DT_INT32, DT_INT32, DT_INT32, DT_INT32, DT_INT32, DT_INT32, DT_STRING], Tout=[DT_FLOAT], token="pyfunc_0", _device="/job:localhost/replica:0/task:0/device:CPU:0"](Relu/_3, PyFunc/input_1, PyFunc/input_2, PyFunc/input_3, PyFunc/input_4, PyFunc/input_2, PyFunc/input_6, PyFunc/input_7)]]

Here is the code(onnx->tensorflow):

import numpy as np
import onnx
from onnx_tf.backend import prepare
import tensorflow as tf

model = onnx.load('G:/Dataset/Head_MRI/@model/mx2onnx/sag_cor_all_slice_exported.onnx')
tf_rep = prepare(model, strict=False)
tf_rep.export_graph('G:/Dataset/Head_MRI/@model/mx2onnx/sag_cor_all_slice_tf.pb')

with tf.Session() as persisted_sess:
    print("load graph")

    persisted_sess.graph.as_default()
    tf.import_graph_def(tf_rep.predict_net.graph.as_graph_def(), name='')

    print('input_node_name = ', tf_rep.predict_net.tensor_dict[tf_rep.predict_net.external_input[0]].name)
    print('output_node_name = ', tf_rep.predict_net.tensor_dict[tf_rep.predict_net.external_output[0]].name)

    inp = persisted_sess.graph.get_tensor_by_name(
        tf_rep.predict_net.tensor_dict[tf_rep.predict_net.external_input[0]].name
    )
    out = persisted_sess.graph.get_tensor_by_name(
        tf_rep.predict_net.tensor_dict[tf_rep.predict_net.external_output[0]].name
    )

    raw_data = np.fromfile('G:/Dataset/MRI_Head_20180502/bin_split_256/SAG_256/0417_256_07.raw', dtype=np.float32)
    raw_data.shape = 256, 256
    test = np.random.rand(1, 3, 256, 256).astype(np.float32)
    for i in range(3):
        test[0, i, :, :] = raw_data

    feed_dict = {inp: test}
    res = persisted_sess.run(out, feed_dict)
    print('The predict result is  ', res)
    print("The digit is classified as ",np.argmax(res))

The code above will print the following information:

load graph
input_node_name =  data:0
output_node_name =  Softmax:0
The predict result is  [[0.95922846 0.04077156]]
The digit is classified as  0

But when I load the .pb file to predict using following code, error raised.

graph_path = 'G:/Dataset/Head_MRI/@model/mx2onnx/sag_cor_all_slice_tf.pb'
with tf.Session() as persisted_sess:
        print("load graph")
        with gfile.FastGFile(graph_path, 'rb') as f:
            graph_def = tf.GraphDef()
            graph_def.ParseFromString(f.read())
            persisted_sess.graph.as_default()
            _ = tf.import_graph_def(graph_def, name='')

        persisted_sess.run(tf.global_variables_initializer())

        inp = persisted_sess.graph.get_tensor_by_name('data:0')
        out = persisted_sess.graph.get_tensor_by_name('Softmax:0')

        raw_data = np.fromfile('G:/Dataset/MRI_Head_20180502/bin_split_256/SAG_256/0417_256_05.raw', dtype=np.float32)
        raw_data.shape = 256, 256
        test = np.random.rand(1, 3, 256, 256).astype(np.float32)
        for i in range(3):
            test[0, i, :, :] = raw_data
        feed_dict = {inp: test}

        res = persisted_sess.run(out, feed_dict)
        print(res)
        print("The digit is classified as ", np.argmax(res))

Any ideas?

Did you manage to solve the problem? I have the same issue. When I use strict=False it works but it doesn't predict correctly. For every single input the output is the same. But when I don't use strict=False I got the same error as you did.