Closed LoicDagnas closed 2 years ago
Hi @LoicDagnas it looks like there's a freezing issue in TF when setting low_control_flow to false:
convert_variables_to_constants_v2(func, lower_control_flow=False)
TF fails to replace the ResourceGather op with a normal Gather. We set lower_control_flow to false so we can maintain the subgraph structure.
I have a workaround for TF but I'm still getting an issue with your model when I try to run it. Were you able to run the model successfully? If so, what input data did you use?
Hi @TomWildenhain-Microsoft , nope I hadn't tried to load and serve the model yet, I just did and it fails for me too:
import onnxruntime
sess_options = onnxruntime.SessionOptions()
sess_options.graph_optimization_level = onnxruntime.GraphOptimizationLevel.ORT_DISABLE_ALL
session = onnxruntime.InferenceSession("path/to/model.onnx", sess_options, providers=["CPUExecutionProvider"])
gives the following error:
onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Load model from C:\dev\tmp\model.onnx failed:Invalid tensor data type
Interesting, the model I got loads but fails at inference time, not session creation. Can you make sure your onnx package is up to date and convert again?
pip uninstall onnx
pip install onnx
Here's the model I got: https://microsoft-my.sharepoint-df.com/:u:/p/tomwi/EW9L7b0INGtIruE89k7akaoBV8Jqb_1RFqmpYvIgjlrgsw?e=PZ6v4p If you still get a session creation error, please upload your .onnx file. It is complaining about an "Invalid tensor data type" but I'm not sure where in the model that is.
In any case, you should be able create a session. Do you have example data to run the model on?
After upgrading onnx from 1.9.0
to 1.10.1
, I am able to load my model in onnx format and also the one you attached in your last message.
Now, feeding the model with some dummy data:
import onnxruntime
sess_options = onnxruntime.SessionOptions()
sess_options.graph_optimization_level = onnxruntime.GraphOptimizationLevel.ORT_DISABLE_ALL
session = onnxruntime.InferenceSession("C:\\dev\\tmp\\bert2bert.onnx", sess_options,
providers=["CPUExecutionProvider"])
input_ids = [101, 2023, 2633, 4504, 1999, 6094, 2008, 6461, 2000, 5787, 2925, 7521, 1997, 2822, 3667, 2000, 1996,
2142, 2163, 1010, 1998, 5561, 2000, 14768, 8041, 4262, 2090, 1996, 2142, 2163, 1998, 2859, 1012, 102,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
]
input_feed = {
"inputs": [input_ids],
"inputs_1": [[0 if i == 0 else 1 for i in input_ids]],
"inputs_2": [[0 for _ in input_ids]],
"inputs_3": [[0 for _ in range(32)]]
}
output = session.run(output_names=None, input_feed=input_feed)
I get in both case (my model and yours) the same error:
onnxruntime.capi.onnxruntime_pybind11_state.InvalidArgument: [ONNXRuntimeError] : 2 : INVALID_ARGUMENT : Non-zero status code returned while running Expand node. Name:'StatefulPartitionedCall/bert2_bert/bert_encoder/position_embedding/BroadcastTo' Status Message: invalid expand shape
@LoicDagnas well at least we both get the same error now.
The most recent issue is present in the TF model as well. When load and run the saved model in tensorflow I get:
ValueError: Dimensions must be equal, but are 128 and 200 for '{{node position_embedding/BroadcastTo}} = BroadcastTo[T=DT_FLOAT, Tidx=DT_INT32](position_embedding/strided_slice_1, position_embedding/Shape)' with input shapes: [128,16], [3] and with input tensors computed as partial shapes: input[1] = [1,200,16]
The problem is caused by changing the shapes. You unfortunately can't change the max sequence length from 128 to 200 just be changing the signature. There are already ops in the graph with that shape hard-coded in, and the model weights may only work on that size.
Mea culpa, I have wrongly build my minimal example.
The serving error simply comes from the default value of max_position_embeddings
in the Bert2BertConfig
. Setting this value to a value greater or equal to 200 solves the problem for both the pb serving and the ONNX serving. To set such parameter, you can do:
from official.nlp.nhnet.models import Bert2Bert, get_bert2bert_layers
from official.nlp.nhnet.configs import BERT2BERTConfig, UNITTEST_CONFIG
bert2bert_config_dict = UNITTEST_CONFIG.copy()
bert2bert_config_dict["max_position_embeddings"] = 200
bert2bert_config_dict["len_title"] = 32
bert2bert_config = BERT2BERTConfig.from_args(**bert2bert_config_dict)
bert_layer, decoder_layer = get_bert2bert_layers(params=bert2bert_config)
bert2bert = Bert2Bert(bert2bert_config, bert_layer, decoder_layer)
Here is the newly generated model in ONNX format
you can run an inference with the folllowing snippet:
import onnxruntime
sess_options = onnxruntime.SessionOptions()
sess_options.graph_optimization_level = onnxruntime.GraphOptimizationLevel.ORT_DISABLE_ALL
session = onnxruntime.InferenceSession("C:\\dev\\tmp\\model.onnx", sess_options, providers=["CPUExecutionProvider"])
input_ids = [101, 2023, 2633, 4504, 1999, 6094, 2008, 6461, 2000, 5787, 2925, 7521, 1997, 2822, 3667, 2000, 1996,
2142, 2163, 1010, 1998, 5561, 2000, 14768, 8041, 4262, 2090, 1996, 2142, 2163, 1998, 2859, 1012, 102
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
input_feed = {
"input_ids:0": [input_ids],
"input_mask:0": [[0 if i == 0 else 1 for i in input_ids]],
"segment_ids:0": [[0 for _ in input_ids]],
"target_ids:0": [[0 for _ in range(32)]]
}
output = session.run(output_names=None, input_feed=input_feed)
So, finally, the only remaining problem is the very first one about the model conversion to ONNX format without using my custom graph freezing method. You mentionned a workaround about that? Is it already usable in the lib?
No problem, glad to hear you got the model working! This code is correctly converting and running the model for me, with no changes to tf2onnx:
import tensorflow as tf
import os
import onnxruntime as ort
import numpy as np
from official.nlp.nhnet.models import Bert2Bert, get_bert2bert_layers
from official.nlp.nhnet.configs import BERT2BERTConfig, UNITTEST_CONFIG
bert2bert_config_dict = UNITTEST_CONFIG.copy()
bert2bert_config_dict["max_position_embeddings"] = 200
bert2bert_config_dict["len_title"] = 32
bert2bert_config = BERT2BERTConfig.from_args(**bert2bert_config_dict)
bert_layer, decoder_layer = get_bert2bert_layers(params=bert2bert_config)
bert2bert = Bert2Bert(bert2bert_config, bert_layer, decoder_layer)
input_ids = [101, 2023, 2633, 4504, 1999, 6094, 2008, 6461, 2000, 5787, 2925, 7521, 1997, 2822, 3667, 2000, 1996,
2142, 2163, 1010, 1998, 5561, 2000, 14768, 8041, 4262, 2090, 1996, 2142, 2163, 1998, 2859, 1012, 102,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
input_feed = {
"input_ids": np.array([input_ids], np.int32),
"input_mask": np.array([[0 if i == 0 else 1 for i in input_ids]], np.int32),
"segment_ids": np.array([[0 for _ in input_ids]], np.int32),
"target_ids": np.array([[0 for _ in range(32)]], np.int32)
}
print(bert2bert(input_feed))
bert2bert.save("bert2bert")
os.system("python -m tf2onnx.convert --saved-model bert2bert --output bert2bert.onnx --opset 14")
sess = ort.InferenceSession("bert2bert.onnx")
print(sess.run(None, input_feed))
I'm noticing that the above code (as well as the working model you just uploaded) makes a model that does not contain a loop, making it much easier to convert (and probably faster to run). It also avoids the freezing error of the ResourceGather op, since there are no subgraphs. I do not know why some of these strategies produce models with loops, while others do not. Maybe the model contains multiple signatures and we are sometimes getting a different one?
Hum, you're rigth, the code above does succeed to convert the Bert2bert model, I didn't notice it.
Unfortunately, I have a custom model inheriting from the Bert2bert
class with a custom signature as in my first example:
from official.nlp.nhnet.models import Bert2Bert, get_bert2bert_layers
from official.nlp.nhnet.configs import UNITTEST_CONFIG, BERT2BERTConfig
import tf2onnx
bert2bert_config = BERT2BERTConfig.from_args(**UNITTEST_CONFIG, len_title=32)
bert_layer, decoder_layer = get_bert2bert_layers(params=bert2bert_config)
bert2bert = Bert2Bert(bert2bert_config, bert_layer, decoder_layer)
@tf.function()
def serve(inputs):
return bert2bert(inputs=inputs, mode="predict")
model_proto, _ = tf2onnx.convert.from_function(
function=serve,
opset=14,
input_signature=[{
"input_ids": tf.TensorSpec(shape=(None, 200,), dtype=tf.int32),
"input_mask": tf.TensorSpec(shape=(None, 200,), dtype=tf.int32),
"segment_ids": tf.TensorSpec(shape=(None, 200,), dtype=tf.int32)
}],
)
but it should'nt really change the graph internal structure since it is basically a simple call of the model call
method.
Yeah, you would think it would be the same, but the way TF models work is they can have multiple function definitions that are chosen based on the signature, but it is rare for those to be super different (normally one just has an additional optional arg or something). We'll need to investigate more, but I think what you are doing with:
@tf.function()
def serve(inputs):
return bert2bert(inputs=inputs, mode="predict")
vs bert2bert.save("bert2bert")
are for some reason choosing different implementations that are very different and one contains a loop. We should theoretically be able to convert it but loops can get dicey and something is going wrong. But in any case, the version without the loop is probably better for you anyway, so we need to figure out why it is picking one and not the other.
I'm noticing that the target_ids
input is included in only one of the two approaches. Does adding target_ids
to your above attempt fix it?
I had the same intuition, but no, adding the target_ids
in the above custom signature doesn't solve the problem 😢
By the way, in the example with the target_ids
, I don't get why we have to have them in the serving signature since these ids should only be used at training time. Anyway, in this case it is an issue I should declare on the TensorFlow official repo.
Yeah it's strange that they are so different. In any case, you should have a working model now. I'm reluctant to add the hack to fix ResourceGather op freezing until I can confirm it works on a model, and we can't get the loop version of this model to work yet. It might have other weirdness that is actually causing the failure. If you figure out what is causing the different models to be selected and/or discover that you actually need the loop version of the model to convert, let me know. Otherwise, is it ok if we close this issue?
Infortunately, I still have an issue with the model I am using for real, it is not simply the Bert2bert model but a model inheriting from it.
I am able to convert it to ONNX format using the snippet of code given here https://github.com/onnx/tensorflow-onnx/issues/1634#issue-958156867 but I cannot load it.
Here is the onnx model: https://drive.google.com/drive/folders/1cLfJ2Eyh3KCkvD0mzngfOKOhM69ieyE1?usp=sharing
The loading of the ONNX model gives the very concise error failed:Invalid tensor data type
.
To be more precise here is my implementation of the Bert2bert
model
class TextGenerator(Bert2Bert):
def __init__(self, params, bert_layer, decoder_layer):
super().__init__(params, bert_layer, decoder_layer)
def call(self, inputs, mode="train"):
bert2bert_inputs = inputs.copy()
bert2bert_inputs['input_mask'] = tf.where(inputs['input_ids'] == 0, 0, 1)
bert2bert_inputs['segment_ids'] = tf.zeros_like(inputs['input_ids'])
if mode != 'predict_with_sampling':
return super(TextGenerator, self).call(bert2bert_inputs, mode)
else:
return self.serve_with_sampling(bert2bert_inputs)
@tf.function()
def serve_with_sampling(self, serve_inputs: Dict[str, tf.Tensor]):
input_ids = serve_inputs['input_ids']
input_mask = serve_inputs['input_mask']
segment_ids = serve_inputs['segment_ids']
queries_count = serve_inputs['queries_count']
all_encoder_outputs, _ = self.bert_layer([input_ids, input_mask, segment_ids])
encoder_decoder_attention_bias = get_attention_bias(
input_ids,
bias_type="single_cross",
padding_value=self.params.pad_token_id)
batch_size = tf.shape(input_ids)[0]
start_token_ids = tf.ones([batch_size],
tf.int32) * self.params.start_token_id
if self.params.use_cache:
cache = self._init_cache(batch_size)
else:
cache = {}
cache["all_encoder_outputs"] = all_encoder_outputs
cache["attention_bias"] = encoder_decoder_attention_bias
generator = SamplingModule(
symbols_to_logits_fn=self._get_symbols_to_logits_fn(self.params.len_title),
length_normalization_fn=None,
vocab_size=self.params.vocab_size,
max_decode_length=self.params.len_title,
eos_id=self.params.end_token_id,
padded_decode=False,
enable_greedy=False,
top_k=10,
top_p=1.0,
sample_temperature=1.0
)
@tf.function()
def generator_call():
result, _ = generator.generate(start_token_ids, cache)
return result
return tf.transpose(
a=tf.map_fn(fn=lambda t: generator_call(),
elems=tf.range(tf.squeeze(queries_count))),
perm=[1, 0, 2])
and here is how I save the model:
@tf.function()
def serve(input_ids: Dict[str, tf.Tensor], queries_count: int):
return text_generator(
inputs={'input_ids': input_ids, 'queries_count': queries_count},
mode="predict_with_sampling")
text_generator.save(saved_model_dir, signatures={
'serve': serve.get_concrete_function(
input_ids=tf.TensorSpec(shape=(None, max_input_length,), dtype=tf.int32),
queries_count=tf.TensorSpec(shape=(), dtype=tf.int32)
)
})
Yeah this model is pretty different from the others. It has some nested loops and stuff. Looking at the onnx model I can clearly see a problem: The TensorListReserve op is not supported. It should have given you an error during conversion with a list of unsupported ops. We should be able to convert that though. The pattern containing it isn't matching for some reason. I'll need a saved model to figure out the conversion.
I think you forgot to include the line text_generator = TextGenerator(...)
. I need to know what params you are using to make the saved model / frozen graph (or you can upload one).
I upload here:
To create the model, I run the following script:
def create_model(bert_config: str,
max_input_length: int,
max_output_length: int,
queries_count: int = 20,
batch_size: int = None) -> tf.keras.Model:
"""
Create a model wrapping around the Google BERT2BERT implementations.
:param bert_config: the configuration of the underlying BERT model
:param max_input_length: maximum length of a single passage
:param max_output_length: maximum length of the generated texts
:param queries_count: number of queries
:param batch_size: size of the batch
:return:
"""
# Build the base BERT configuration
bert_config = BertConfig.from_json_file(bert_config).to_dict()
# Drop the parameters which don't exist in the BERT2BERT configuration
bert_config.pop('embedding_size', None)
bert_config.pop('backward_compatible', None)
# Finally build the BERT2BERT configuration, setting the exact same parameters for decoders
bert2bert_config = BERT2BERTConfig(
num_decoder_attn_heads=bert_config['num_attention_heads'],
num_decoder_layers=bert_config['num_hidden_layers'],
decoder_intermediate_size=bert_config['intermediate_size'],
beam_size=queries_count,
**bert_config)
bert2bert_config.override(
{
"len_title": max_output_length,
},
is_strict=False
)
# Build the encoder decoder model
bert_layer, decoder_layer = get_bert2bert_layers(params=bert2bert_config)
# Create the main model instance
text_generator = TextGenerator(
params=bert2bert_config,
bert_layer=bert_layer,
decoder_layer=decoder_layer)
# Call the model
input_ids = tf.keras.layers.Input((max_input_length,), dtype=tf.int32, name="input_ids", batch_size=batch_size)
queries_count = tf.keras.layers.Input((), dtype=tf.int32, name="queries_count")
target_ids = tf.keras.layers.Input((max_output_length,), dtype=tf.int32, name="target_ids")
inputs = {
'input_ids': input_ids,
'queries_count': queries_count,
'target_ids': target_ids
}
text_generator(inputs, mode='predict_with_sampling')
return text_generator
where the bert_config
argument is a path to a json file containing such a BERT configuration:
{
"hidden_size": 256,
"hidden_act": "gelu",
"initializer_range": 0.02,
"vocab_size": 30522,
"hidden_dropout_prob": 0.1,
"num_attention_heads": 4,
"type_vocab_size": 2,
"max_position_embeddings": 512,
"num_hidden_layers": 4,
"intermediate_size": 1024,
"attention_probs_dropout_prob": 0.1
}
The freezing issue for the saved model will be fixed here: #1672
Turns out Tensorflow doesn't implement freezing that op: https://github.com/tensorflow/tensorflow/issues/51488
There are still issues to resolve though for this model. What are you using it for btw? I notice that you are converting a lot of different models to ONNX.
Hello @TomWildenhain-Microsoft,
Sorry for answering so late, I was in holidays.
First of all, thanks since the conversion seems to work now, so it solves the first part of my issue.
Here is the serving error I get now:
[ONNXRuntimeError] : 2 : INVALID_ARGUMENT : Non-zero status code returned while running Loop node. Name:'StatefulPartitionedCall/query_generator/StatefulPartitionedCall/map/while_loop' Status Message: Non-zero status code returned while running Loop node. Name:'map/while/StatefulPartitionedCall/while_loop' Status Message: Non-zero status code returned while running Gather node. Name:'while/decoder/word_embeddings/Gather' Status Message: indices element out of data bounds, idx=30522 must be within the inclusive range [-30522,30521]
running:
import onnxruntime
sess_options = onnxruntime.SessionOptions()
sess_options.graph_optimization_level = onnxruntime.GraphOptimizationLevel.ORT_DISABLE_ALL
session = onnxruntime.InferenceSession('path/to/model.onnx', sess_options, providers=["CPUExecutionProvider"])
input_ids = [101, 2023, 2633, 4504, 1999, 6094, 2008, 6461, 2000, 5787, 2925, 7521, 1997, 2822, 3667, 2000, 1996,
2142, 2163, 1010, 1998, 5561, 2000, 14768, 8041, 4262, 2090, 1996, 2142, 2163, 1998, 2859, 1012, 102,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
input_feed = {
"input_ids": [input_ids],
"queries_count": [5]
}
session.run(output_names=None, input_feed=input_feed)
To answer your last question, we train TensorFlow models in my company, and we are currently evaluating the feasibility to choose ORT as inference runtime.
@LoicDagnas
I'm able to to get the same error on my side. This looks like it is going to be pretty hard to track down and I'm a bit busy this week, but I'll try when I get a chance. The error is definitely a bad calculation (likely an incorrect node wiring during conversion) that is evident in the first iteration of the loop. To debug, I might have to capture all the intermediate outputs from the first iteration of the loop in tf/onnx and compare them.
Also let me know if you want to talk to one of our PMs. They like to hear from ORT users.
Cool thank you, let me know if you need anything else to help your investigations.
We'll be glad to talk with one of your PMs, they can reach me at dagnas@sinequa.com.
hello @TomWildenhain-Microsoft by any chance did you have a bit of time to have a look on the issue?
Thanks.
Hey @LoicDagnas sorry for the delay here. I haven't had a chance to look at this and my responsibilities have shifted away from tf2onnx so I'm not sure I'll be able to. If you have time and are willing to help determine the issue in tf2onnx then fixing it will be a lot easier.
Here's the approach I'd take: Using --output_frozen_graph during conversion we can get a tf graph and an onnx graph to compare. Open them in netron. From some debugging I did before I think the issue is with the inner most graph (inside 2 nested loops) something is going wrong and one of the graph outputs doesn't match TF. If you can pinpoint what that is through examination, then maybe you can find the bug. Keep in mind the ONNX loop semantics differ from tf (https://github.com/onnx/onnx/blob/master/docs/Operators.md#Loop)
If you can't find the issue through manual inspection, then you will need to get all the intermediate values out of the graph in both ONNX and TF and find the first discrepancy. I'd bet it isn't actually a miscalculation (since the ops look the same to me) but a mis-wiring where something goes to the wrong place. To get intermediate values without rebuilding ORT you can spy on values by inserting a custom python op (https://github.com/microsoft/onnxruntime-extensions/blob/main/tutorials/tf2onnx_custom_ops_tutorial.ipynb). I think tf has something similar.
This type of bug where tf2onnx generates a valid model that produces invalid results is rare and is probably one of the toughest to debug (especially in a subgraph where you can't just add all the values as graph outputs to read them).
It's been over 2 months, so closing this. Feel free to open a new one if the issue still exists.
Describe the bug Converting a Bert2Bert model from TensorFlow model official, I get the exact same error either using the conversion capacity from pb or from function:
System information
To Reproduce Here is the minimal code to reproduce my issue, it uses the Bert2Bert model from Tensorflow model official.
If it is simpler, I also attach a pb of the Bert2Bert model saved_model.zip, to reproduce the exact same bug, simply run
Additional context However, when using my custom graph freezing method and then the tf2onnx conversion from the resulting frozen graph, it works perfectly fine. Here is how I froze my graph: