onnx / tensorflow-onnx

Convert TensorFlow, Keras, Tensorflow.js and Tflite models to ONNX
Apache License 2.0
2.3k stars 432 forks source link

[ONNXRuntimeError] : 10 : INVALID_GRAPH : This is an invalid model. Error in Node:model/multi_category_encoding/AsString : No Op registered for AsString with domain_version of 9 #1645

Closed hanzigs closed 3 years ago

hanzigs commented 3 years ago

Below code works perfect when run in python file (python==3.9.5, tensorflow==2.5.0, keras2onnx==1.7.0, onnxruntime==1.8.0, keras==2.4.3, tf2onnx==1.9.1)

autoKeras_model = StructuredDataClassifier(max_trials=MaxTrials)
autoKeras_model.fit(x=X_train, y=y_train, validation_data=(X_valid, y_valid), epochs=Epochs, verbose=1)
ExportedautoKeras_model = autoKeras_model.export_model()

onnx_model, _ = tf2onnx.convert.from_keras(ExportedautoKeras_model )
content = onnx_model.SerializeToString()
sess = onnxruntime.InferenceSession(content)

Same code inside Flask App, InferenceSession throws error

sess = onnxruntime.InferenceSession(content)

  File "C:\Users\plg\Anaconda3\envs\automl04augpy395elk7120\lib\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py", line 283, in __init__
    self._create_inference_session(providers, provider_options, disabled_optimizers)
  File "C:\Users\plg\Anaconda3\envs\automl04augpy395elk7120\lib\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py", line 312, in _create_inference_session
    sess = C.InferenceSession(session_options, self._model_bytes, False, self._read_config_from_model)
onnxruntime.capi.onnxruntime_pybind11_state.InvalidGraph: [ONNXRuntimeError] : 10 : INVALID_GRAPH : This is an invalid model. Error in Node:model/multi_category_encoding/AsString : No Op registered for AsString with domain_version of 9
I am mainly after input_name

If that's a converter bug, how should I find the correct opset? (I have tried opset from 9 to 13, all throws error) then why that error not raised in standalone run?

Any help please, Thanks

guschmue commented 3 years ago

Looks we don't support the AsString() op. Let me check if we can handle this in the converter.

hanzigs commented 3 years ago

Is there a work around like custom op till we get the converter update please, Thanks

guschmue commented 3 years ago

I have some code that maps AsString to ONNX Case which kind of works but doesn't honor all attribute AsString has. But maybe its good enough for autokeras. If it works for autokeras I'll send a PR.

guschmue commented 3 years ago

worked for me for the structured_classifier example so we merged a PR: https://github.com/onnx/tensorflow-onnx/pull/1648

You can try with

pip install git+https://github.com/onnx/tensorflow-onnx
hanzigs commented 3 years ago

@guschmue Thank you very much for quick response

Now I am getting this error, No Op registered for LookupTableFindV2 with domain_version of 9 created a fresh env and installed pip install git+https://github.com/onnx/tensorflow-onnx (python==3.9.5, tensorflow==2.5.0, tf2onnx==1.10.0, onnxruntime==1.8.0)

sess = onnxruntime.InferenceSession(content)

  File "C:\Users\plg\Anaconda3\envs\automl07augpy395elk7120\lib\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py", line 283, in __init__
    self._create_inference_session(providers, provider_options, disabled_optimizers)

  File "C:\Users\plg\Anaconda3\envs\automl07augpy395elk7120\lib\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py", line 312, in _create_inference_session

    sess = C.InferenceSession(session_options, self._model_bytes, False, self._read_config_from_model)
onnxruntime.capi.onnxruntime_pybind11_state.InvalidGraph: [ONNXRuntimeError] : 10 : INVALID_GRAPH : This is an invalid model. Error in Node:model/multi_category_encoding/string_lookup_1/None_lookup_table_find/LookupTableFindV2 : No Op registered for LookupTableFindV2 with domain_version of 9

As before it works normally in a python file, but in a flask app throws error, found similar in https://github.com/onnx/tensorflow-onnx/pull/1228

TomWildenhain-Microsoft commented 3 years ago

What are the shapes and dtypes of X_train, y_train, X_valid, y_valid? Can you upload a zipped saved model of the keras model? Sorry I've never used autokeras before.

hanzigs commented 3 years ago

X_train, y_train, X_valid, y_valid , (1056, 16) (1056,) (191, 16) (191,) respectively, all numpy.ndarray (python==3.9.5, tensorflow==2.5.0, tf2onnx==1.10.0, onnxruntime==1.8.0) creating model is simple, I can attach the pickles of X_train, y_train, X_valid, y_valid, May I know where please

pip install autokeras==1.0.15
from autokeras import StructuredDataClassifier
akmodel = StructuredDataClassifier(max_trials=10)
akmodel.fit(x=X_train, y=y_train, validation_data=(X_valid, y_valid), epochs=100)
autoKeras_model = akmodel.export_model()

onnx_model, _ = tf2onnx.convert.from_keras(model)
content = onnx_model.SerializeToString()
sess = onnxruntime.InferenceSession(content)
input_name = sess.get_inputs()[0].name
label_name = sess.get_outputs()[0].name
TomWildenhain-Microsoft commented 3 years ago

Can you upload the pickles to OneDrive/GoogleDrive/Dropbox and post a link? Are those all np.int32 or np.float32?

hanzigs commented 3 years ago

yes they are

https://drive.google.com/drive/folders/1HfB00dOuk-awSmIrSg92hmJFYzTpQNCr?usp=sharing

attached in google drive, you can open with

import pickle
with open('filename','rb') as f: arrayname1 = pickle.load(f)
TomWildenhain-Microsoft commented 3 years ago

Great, I just requested access to the drive link.

TomWildenhain-Microsoft commented 3 years ago

I just run conversion and it works for me. The resulting model runs in ORT and produces results. However, my model does not contain AsString, maybe I'm using a different version of autokeras. My converted onnx model looks like this:

image

hanzigs commented 3 years ago

Actually as said before model created successfully in python file, and InferenceSession creates successful in python file

InferenceSession throws error in flask app

TomWildenhain-Microsoft commented 3 years ago

Ah, so sorry. Didn't catch that. What version of onnxruntime does the flask application use?

hanzigs commented 3 years ago

Same kind of issue as in #1228

hanzigs commented 3 years ago

All same versions

TomWildenhain-Microsoft commented 3 years ago

Same kind of issue as in #1228

Can you please elaborate on this? Are you getting a "Default value of table lookup must be const." error? Are you running the conversion code within flask too, or just onnxruntime? You can save both the keras saved model and the onnx model with:

ExportedautoKeras_model.save("autokerasmodel")
onnx_model, _ = tf2onnx.convert.from_keras(ExportedautoKeras_model, output_path="autokeras.onnx")

I find it very surprising that you get different results in flask. Is your flask running from a different virtualenv? Are you sure your autokeras version is the same?

hanzigs commented 3 years ago

Yes I am using same model for conversion to onnx

Inside flask, creating the model and converting it, and trying to get the session results all at once

TomWildenhain-Microsoft commented 3 years ago

I think it is very likely that the keras models you get in flask and the plain python script are different. Can you please add this line: ExportedautoKeras_model.save("autokerasmodel") and zip the results of the python and flask scripts?

hanzigs commented 3 years ago

Inside Flask App, I have two functions, one model creation and passing the model to onnxconverter function, not sure is that a issue, now will try to put both in same function,

TomWildenhain-Microsoft commented 3 years ago

That should not be an issue. Again to confirm, are you using the same virtualenv for flask as the python script?

hanzigs commented 3 years ago

yes, python and flask are in same env

TomWildenhain-Microsoft commented 3 years ago

Is the training data you are using (X_train, y_train, X_valid, y_valid) the same values for both?

hanzigs commented 3 years ago

Also in normal python file, onnxConversion and InferenceSession works But when i do prediction from onnx model it throws error like

content = ONNXModel.SerializeToString()
sess = onnxruntime.InferenceSession(content)
input_name = sess.get_inputs()[0].name 
label_name = sess.get_outputs()[0].name
pred_onnx = sess.run([label_name], {input_name: test_record})[0]
TomWildenhain-Microsoft commented 3 years ago

Are you able to capture the keras saved model from flask?

hanzigs commented 3 years ago

Can you please confirm the prediction test.

hanzigs commented 3 years ago

yes i can

TomWildenhain-Microsoft commented 3 years ago

Can you please confirm the prediction test.

I am able to successfully run predictions using the onnx model I have generated. The model is uploaded to the shared drive folder as as autokeras_tw.onnx

yes i can

Awesome. Please capture and upload the keras saved models and converted onnx models for flask and the python script and upload them to the Google Drive folder as autokeras_flask.zip, autokeras_flask.onnx, autokeras_python.zip, autokeras_python.onnx. If I have those, I may be able to reproduce the issue. So far, I can't reproduce it at all.

hanzigs commented 3 years ago

I have uploaded a "ONNXmodel.onnx" and "creditloan_prediction_20210806T210907" in the drive, can you please try to create a session from any of the two, both created in flask

TomWildenhain-Microsoft commented 3 years ago

Both models give me the error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\tomwi\AppData\Local\Programs\Python\Python39\lib\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py", line 283, in __init__
    self._create_inference_session(providers, provider_options, disabled_optimizers)
  File "C:\Users\tomwi\AppData\Local\Programs\Python\Python39\lib\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py", line 310, in _create_inference_session
    sess = C.InferenceSession(session_options, self._model_path, True, self._read_config_from_model)
onnxruntime.capi.onnxruntime_pybind11_state.InvalidGraph: [ONNXRuntimeError] : 10 : INVALID_GRAPH : Load model from C:\Users\tomwi\Downloads\ONNXModel.onnx failed:This is an invalid model. Error in Node:model/multi_category_encoding/string_lookup_1/None_lookup_table_find/LookupTableFindV2 : No Op registered for LookupTableFindV2 with domain_version of 9

But I will need a saved model to diagnose the cause of the conversion failure. If you are not able to upload a saved model due to privacy/security concerns, I can try to walk through the debugging on your end, or we can wait for @guschmue who might have better luck reproducing the issue with autokeras.

hanzigs commented 3 years ago

Yes, thats the saved model from flask, and thats the error Regarding files, will have to create a separate one, because thats have a huge links with other files, will send once created yeah, i am ok for the walk through, let me know how

hanzigs commented 3 years ago

I have printed the onnx model from flask run and saved in a file "onnxfile.txt" and uploaded in the drive, if that helps Also uploaded the onnx file for the same "flask_onnx_model.onnx", this also throws error Thank you very much for the support, much appreciated

TomWildenhain-Microsoft commented 3 years ago

So the issue is here: image The LookupTableFindV2 op shouldn't be in the final model. It should be removed by: https://github.com/onnx/tensorflow-onnx/blob/04d24880751e4f753623d9097819e793e75962a9/tf2onnx/custom_opsets/onnx_ml.py#L67

Try to determine whether that handler is running and if it is entering that conditional.

TomWildenhain-Microsoft commented 3 years ago

Also is anything printed to the console during conversion, and are any exceptions raised? Change the log level before conversion with: import logging logging.basicConfig(level=logging.INFO)

hanzigs commented 3 years ago

Here is the full error from flask error before printing the model

ERROR:tf2onnx.tf_loader:Could not find table resource to replace placeholder model/multi_category_encoding/string_lookup_1/None_lookup_table_find/LookupTableFindV2/table_handle
INFO:tf2onnx.tfonnx:Using tensorflow=2.5.0, onnx=1.10.0, tf2onnx=1.10.0/04d248
INFO:tf2onnx.tfonnx:Using opset <onnx, 9>
WARNING:tf2onnx.shape_inference:Cannot infer shape for model/multi_category_encoding/string_lookup_1/None_lookup_table_find/LookupTableFindV2: model/multi_category_encoding/string_lookup_1/None_lookup_table_find/LookupTableFindV2:0
WARNING:tf2onnx.shape_inference:Cannot infer shape for model/multi_category_encoding/Cast_1: model/multi_category_encoding/Cast_1:0
INFO:tf2onnx.tf_utils:Computed 0 values for constant folding
WARNING:tf2onnx.onnx_opset.tensor:ONNX does not support precision, scientific and fill attributes for AsString
ERROR:tf2onnx.tfonnx:Failed to convert node 'model/multi_category_encoding/string_lookup_1/None_lookup_table_find/LookupTableFindV2' (fct=<bound method LookupTableFind.version_8 of <class 'tf2onnx.custom_opsets.onnx_ml.LookupTableFind'>>)
'OP=LookupTableFindV2\nName=model/multi_category_encoding/string_lookup_1/None_lookup_table_find/LookupTableFindV2\nInputs:\n\tmodel/multi_category_encoding/string_lookup_1/None_lookup_table_find/LookupTableFindV2/table_handle:0=Placeholder, [], 7\n\tmodel/multi_category_encoding/AsString:0=Cast, [-1, 1], 8\n\tmodel/multi_category_encoding/string_lookup_1/None_lookup_table_find/LookupTableFindV2/default_value:0=Const, [], 7\nOutpus:\n\tmodel/multi_category_encoding/string_lookup_1/None_lookup_table_find/LookupTableFindV2:0=None, 7'
Traceback (most recent call last):
  File "C:\Users\pl\Anaconda3\envs\AutoMLIntuitionAug2021py395elk7120\lib\site-packages\tf2onnx\tfonnx.py", line 292, in tensorflow_onnx_mapping
    func(g, node, **kwargs, initialized_tables=initialized_tables, dequantize=dequantize)
  File "C:\Users\pl\Anaconda3\envs\AutoMLIntuitionAug2021py395elk7120\lib\site-packages\tf2onnx\custom_opsets\onnx_ml.py", line 34, in version_8
    utils.make_sure(shared_name is not None, "Could not determine table shared name for node %s", node.name)
  File "C:\Users\pl\Anaconda3\envs\AutoMLIntuitionAug2021py395elk7120\lib\site-packages\tf2onnx\utils.py", line 260, in make_sure
    raise ValueError("make_sure failure: " + error_msg % args)
ValueError: make_sure failure: Could not determine table shared name for node model/multi_category_encoding/string_lookup_1/None_lookup_table_find/LookupTableFindV2
INFO:tf2onnx.optimizer:Optimizing ONNX model
INFO:tf2onnx.optimizer:After optimization: Const -19 (29->10), Identity -2 (2->0)

this error during creating Session

ERROR:AutoMLWebApi:500 Internal Server Error: [ONNXRuntimeError] : 10 : INVALID_GRAPH : This is an invalid model. Error in Node:model/multi_category_encoding/string_lookup_1/None_lookup_table_find/LookupTableFindV2 : No Op registered for LookupTableFindV2 with domain_version of 9
Traceback (most recent call last):
  File "C:\pl\AutoML\AutoMLIntuitionJuly2021py395elk7120\AutoMLWebApi.py", line 88, in autoML
    result = AutoMLTrainer.train(tenant_id, data)
  File "C:\pl\AutoML\AutoMLIntuitionJuly2021py395elk7120\AutoMLTrainer.py", line 903, in train
    uploadToElastic(es,
  File "C:\pl\AutoML\AutoMLIntuitionJuly2021py395elk7120\AutoMLUtils.py", line 3927, in uploadToElastic
    sess = onnxruntime.InferenceSession(content)
  File "C:\Users\pl\Anaconda3\envs\AutoMLIntuitionAug2021py395elk7120\lib\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py", line 283, in __init__
    self._create_inference_session(providers, provider_options, disabled_optimizers)
  File "C:\Users\pl\Anaconda3\envs\AutoMLIntuitionAug2021py395elk7120\lib\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py", line 312, in _create_inference_session 
    sess = C.InferenceSession(session_options, self._model_bytes, False, self._read_config_from_model)
onnxruntime.capi.onnxruntime_pybind11_state.InvalidGraph: [ONNXRuntimeError] : 10 : INVALID_GRAPH : This is an invalid model. Error in Node:model/multi_category_encoding/string_lookup_1/None_lookup_table_find/LookupTableFindV2 : No Op registered for LookupTableFindV2 with domain_version of 9

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\pl\Anaconda3\envs\AutoMLIntuitionAug2021py395elk7120\lib\site-packages\flask\app.py", line 1513, in full_dispatch_request
    rv = self.dispatch_request()
  File "C:\Users\pl\Anaconda3\envs\AutoMLIntuitionAug2021py395elk7120\lib\site-packages\flask\app.py", line 1499, in dispatch_request
    return self.ensure_sync(self.view_functions[rule.endpoint])(**req.view_args)
  File "C:\pl\AutoML\AutoMLIntuitionJuly2021py395elk7120\AutoMLWebApi.py", line 96, in autoML
    abort(500, err)
  File "C:\Users\pl\Anaconda3\envs\AutoMLIntuitionAug2021py395elk7120\lib\site-packages\werkzeug\exceptions.py", line 940, in abort
    _aborter(status, *args, **kwargs)
  File "C:\Users\pl\Anaconda3\envs\AutoMLIntuitionAug2021py395elk7120\lib\site-packages\werkzeug\exceptions.py", line 923, in __call__
    raise self.mapping[code](*args, **kwargs)
werkzeug.exceptions.InternalServerError: 500 Internal Server Error: [ONNXRuntimeError] : 10 : INVALID_GRAPH : This is an invalid model. Error in Node:model/multi_category_encoding/string_lookup_1/None_lookup_table_find/LookupTableFindV2 : No Op registered for LookupTableFindV2 with domain_version of 9
INFO:werkzeug:127.0.0.1 - - [07/Aug/2021 10:58:40] "POST /tenant_id/train HTTP/1.1" 500 -
TomWildenhain-Microsoft commented 3 years ago

Could not determine table shared name for node oh haha you should have lead with that. Is that what you meant by "Same kind of issue as in #1228". #1228 had a few error messages involved.

tf2onnx isn't finding the values for the lookup table.

Try this: make a saved model, then convert it with python -m tf2onnx.convert --saved-model mysavedmodel --output model.onnx from the command line.

hanzigs commented 3 years ago

actually this will used for deployment, this procedure should be followed, if i make it work from cmd line will not be useful

hanzigs commented 3 years ago

even saving model to disk is not useful, coz this will be deployed in docker container

hanzigs commented 3 years ago

is there a workaround for that please

TomWildenhain-Microsoft commented 3 years ago

Ok, then we need to find the data in the keras model's lookup table without saving it. The challenge here is that the lookup table is held inside the TF runtime and it can be challenging to extract it. Normally a second copy is stored on the keras model itself, and this method searches for it:

https://github.com/onnx/tensorflow-onnx/blob/04d24880751e4f753623d9097819e793e75962a9/tf2onnx/tf_loader.py#L406

You'll need to find where the lookup table info is stored on the keras model.

hanzigs commented 3 years ago

ok, how to do that?

TomWildenhain-Microsoft commented 3 years ago

Can you attach a debugger?

hanzigs commented 3 years ago

is that mean, the flask in debug mode?

TomWildenhain-Microsoft commented 3 years ago

Are you using VSCode or another IDE?

hanzigs commented 3 years ago

yeah VSCode

TomWildenhain-Microsoft commented 3 years ago

Great. Launch the app in debug mode: https://code.visualstudio.com/docs/python/tutorial-flask#_run-the-app-in-the-debugger

Drop a breakpoint after model creation. Use the debug console to explore the model's attributes... sorry it's a bit tricky.

hanzigs commented 3 years ago

yeah here is the screen shot, what to be checked there image

hanzigs commented 3 years ago

do I need to step into from_keras() and go to this function def _get_hash_table_info_from_trackable(trackable, table_names, key_dtypes, value_dtypes, ?

TomWildenhain-Microsoft commented 3 years ago

One moment, I'm checking something.

TomWildenhain-Microsoft commented 3 years ago

In the model, you want to find the table's resource handle and dtype. The resource handle is key, since it lets you request the lookup table's contents from tensorflow. For mine, it is stored on the lookup table layer:

image

Going a few layers deep, I find: image

You really shouldn't have to find it manually since we do a pretty comprehensive search, but autokeras must put it somewhere we don't expect.

TomWildenhain-Microsoft commented 3 years ago

If you find it, we can update our search to find it automatically for you based on the attributes it end up on.