microsoft / onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
https://onnxruntime.ai
MIT License
14.47k stars 2.9k forks source link

onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : The node is not placed on any Execution Provider. OneHot(11) (node while/cond_5/one_hot). #5825

Open oswen opened 3 years ago

oswen commented 3 years ago

I use tf2onnx to convert my tensorflow model to onnx model, and then use the python interface of onnxruntime to run the model, indicating that there is a problem with onehot operation. My model is Bert + beam search, uses tensorflow's while_ loop, environment version: tensorflow=1.14.0, onnx=1.7.0, tf2onnx=1.7.1/796841, opset <onnx, 12>, onnxruntime=1.5.1

Here are the error messages: Traceback (most recent call last): File "D:/PyPjt/bert-vector/model_to_saved.py", line 811, in onnx_runtime_test() File "D:/PyPjt/bert-vector/model_to_saved.py", line 767, in onnx_runtime_test sess = ort.InferenceSession("model/onnx_model/chatbot11.onnx") File "C:\Users\w00444710\AppData\Local\Continuum\anaconda3\lib\site-packages\onnxruntime\capi\session.py", line 195, in init self._create_inference_session(providers, provider_options) File "C:\Users\w00444710\AppData\Local\Continuum\anaconda3\lib\site-packages\onnxruntime\capi\session.py", line 205, in _create_inference_session sess.initialize_session(providers or [], provider_options or []) onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : The node is not placed on any Execution Provider. OneHot(11) (node while/cond_5/one_hot).

hariharans29 commented 3 years ago

This is most likely due to missing type support in the OneHot kernel. ONNX supports a lot of types - https://github.com/onnx/onnx/blob/master/docs/Operators.md#OneHot and ORT implements the (most common) subset combinations - https://github.com/microsoft/onnxruntime/blob/794e8479eb52139036f3df5dca6c43305c555616/onnxruntime/core/providers/cpu/cpu_execution_provider.cc#L393. Sharing the model might help.

oswen commented 3 years ago

This is most likely due to missing type support in the OneHot kernel. ONNX supports a lot of types - https://github.com/onnx/onnx/blob/master/docs/Operators.md#OneHot and ORT implements the (most common) subset combinations -

https://github.com/microsoft/onnxruntime/blob/794e8479eb52139036f3df5dca6c43305c555616/onnxruntime/core/providers/cpu/cpu_execution_provider.cc#L393

. Sharing the model might help.

Thanks for your reply. I'll check my code

hariharans29 commented 3 years ago

Can you share the model please ? Or atleast a single-node model that can repro the issue with OneHot ?

oswen commented 3 years ago

Can you share the model please ? Or atleast a single-node model that can repro the issue with OneHot ?

I'm sorry, since the model is too large to upload. But I can talk about my solution. I just change a line code: before: valid_ngram_onehot = tf.one_hot(valid_ngram_ids, vocab_size, dtype=tf.int32) * valid_ngram_mask after: valid_ngram_onehot = tf.one_hot(valid_ngram_ids, vocab_size) * valid_ngram_mask The parameter dtype indicates the data type of output, default is tf.float32, and I just restored the dytpe to default value.

hariharans29 commented 3 years ago

Thanks. The ONNX model originally (before your tweak) was valid but as I said before ORT implements a subset of the different types supported by OneHot (https://github.com/onnx/onnx/blob/master/docs/Operators.md#OneHot).