Bert2Bert concat axis dimension error at serving time

Describe the bug Converting a Bert2Bert model from TensorFlow model official, I get the following errors at serving time:

Traceback (most recent call last):
  File "C:/dev/ml/QueryGenerator/query_generator/models/bert2bert/save_model.py", line 245, in <module>
    output = session.run(output_names=None, input_feed=input_feed)
  File "C:\dev\ml\QueryGenerator\venv\lib\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py", line 188, in run
    return self._sess.run(output_names, input_feed, run_options)
onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running Loop node. Name:'bert2_bert/while_loop' Status Message: Non-zero status code returned while running Concat node. Name:'bert2_bert/while/decoder/decoder/layer_0/self_attention/concat' Status Message: concat.cc:159 onnxruntime::ConcatBase::PrepareForCompute Non concat axis dimensions must match: Axis 0 has mismatched dimensions of 10 and 6

It is quite close to the error I had at the end of this issue, but:

the model is different
the minimal code is simpler
the error is different

System information

OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Windows 10.0.19042
Tensorflow Version: 2.5.0
Python version: 3.7.6

To Reproduce It is quite easy to reproduce using the following code:

 import tensorflow as tf
 import onnxruntime
 import tf2onnx
 from official.nlp.nhnet.configs import UNITTEST_CONFIG, BERT2BERTConfig
 from official.nlp.nhnet.models import Bert2Bert, get_bert2bert_layers

 MAX_SEQ_LENGTH = 10
 MAX_OUTPUT_LENGTH = 4

 # Create the Bert2Bert model
 bert2bert_config_dict = UNITTEST_CONFIG.copy()
 bert2bert_config_dict["max_position_embeddings"] = MAX_SEQ_LENGTH
 bert2bert_config_dict["len_title"] = MAX_OUTPUT_LENGTH
 bert2bert_config = BERT2BERTConfig.from_args(**bert2bert_config_dict)
 bert_layer, decoder_layer = get_bert2bert_layers(params=bert2bert_config)

 bert2bert = Bert2Bert(bert2bert_config, bert_layer, decoder_layer)

 # Define the serving function
 @tf.function()
 def serve(inputs):
     return bert2bert(inputs=inputs, mode="predict")

 # Convert the model to ONNX and save it
 model_proto, _ = tf2onnx.convert.from_function(
     function=serve,
     opset=14,
     input_signature=[{
         'input_ids': tf.TensorSpec(shape=(None, MAX_SEQ_LENGTH,), dtype=tf.int32, name='input_ids'),
         'input_mask': tf.TensorSpec(shape=(None, MAX_SEQ_LENGTH,), dtype=tf.int32, name='input_mask'),
         'segment_ids': tf.TensorSpec(shape=(None, MAX_SEQ_LENGTH,), dtype=tf.int32, name='segment_ids')
     }],
     output_path='model.onnx'
 )

 # Try to serve the model
 input_ids = [101, 2023, 2633, 4504, 1999, 6094, 2008, 102, 0, 0]
 sess_options = onnxruntime.SessionOptions()
 sess_options.graph_optimization_level = onnxruntime.GraphOptimizationLevel.ORT_DISABLE_ALL
 session = onnxruntime.InferenceSession('model.onnx',
                                        sess_options,
                                        providers=["CPUExecutionProvider"])

 input_feed = {
     "input_ids": [input_ids],
     "input_mask": [[0 if i == 0 else 1 for i in input_ids]],
     "segment_ids": [[0 for _ in input_ids]]
 }

output = session.run(output_names=None, input_feed=input_feed)

onnx / tensorflow-onnx

Bert2Bert concat axis dimension error at serving time #1757