Describe the bug
Converting a Bert2Bert model from TensorFlow model official, I get the following errors at serving time:
Traceback (most recent call last):
File "C:/dev/ml/QueryGenerator/query_generator/models/bert2bert/save_model.py", line 245, in <module>
output = session.run(output_names=None, input_feed=input_feed)
File "C:\dev\ml\QueryGenerator\venv\lib\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py", line 188, in run
return self._sess.run(output_names, input_feed, run_options)
onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running Loop node. Name:'bert2_bert/while_loop' Status Message: Non-zero status code returned while running Concat node. Name:'bert2_bert/while/decoder/decoder/layer_0/self_attention/concat' Status Message: concat.cc:159 onnxruntime::ConcatBase::PrepareForCompute Non concat axis dimensions must match: Axis 0 has mismatched dimensions of 10 and 6
It is quite close to the error I had at the end of this issue, but:
the model is different
the minimal code is simpler
the error is different
System information
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Windows 10.0.19042
Tensorflow Version: 2.5.0
Python version: 3.7.6
To Reproduce
It is quite easy to reproduce using the following code:
import tensorflow as tf
import onnxruntime
import tf2onnx
from official.nlp.nhnet.configs import UNITTEST_CONFIG, BERT2BERTConfig
from official.nlp.nhnet.models import Bert2Bert, get_bert2bert_layers
MAX_SEQ_LENGTH = 10
MAX_OUTPUT_LENGTH = 4
# Create the Bert2Bert model
bert2bert_config_dict = UNITTEST_CONFIG.copy()
bert2bert_config_dict["max_position_embeddings"] = MAX_SEQ_LENGTH
bert2bert_config_dict["len_title"] = MAX_OUTPUT_LENGTH
bert2bert_config = BERT2BERTConfig.from_args(**bert2bert_config_dict)
bert_layer, decoder_layer = get_bert2bert_layers(params=bert2bert_config)
bert2bert = Bert2Bert(bert2bert_config, bert_layer, decoder_layer)
# Define the serving function
@tf.function()
def serve(inputs):
return bert2bert(inputs=inputs, mode="predict")
# Convert the model to ONNX and save it
model_proto, _ = tf2onnx.convert.from_function(
function=serve,
opset=14,
input_signature=[{
'input_ids': tf.TensorSpec(shape=(None, MAX_SEQ_LENGTH,), dtype=tf.int32, name='input_ids'),
'input_mask': tf.TensorSpec(shape=(None, MAX_SEQ_LENGTH,), dtype=tf.int32, name='input_mask'),
'segment_ids': tf.TensorSpec(shape=(None, MAX_SEQ_LENGTH,), dtype=tf.int32, name='segment_ids')
}],
output_path='model.onnx'
)
# Try to serve the model
input_ids = [101, 2023, 2633, 4504, 1999, 6094, 2008, 102, 0, 0]
sess_options = onnxruntime.SessionOptions()
sess_options.graph_optimization_level = onnxruntime.GraphOptimizationLevel.ORT_DISABLE_ALL
session = onnxruntime.InferenceSession('model.onnx',
sess_options,
providers=["CPUExecutionProvider"])
input_feed = {
"input_ids": [input_ids],
"input_mask": [[0 if i == 0 else 1 for i in input_ids]],
"segment_ids": [[0 for _ in input_ids]]
}
output = session.run(output_names=None, input_feed=input_feed)
Describe the bug Converting a Bert2Bert model from TensorFlow model official, I get the following errors at serving time:
It is quite close to the error I had at the end of this issue, but:
System information
To Reproduce It is quite easy to reproduce using the following code: