microsoft / onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
https://onnxruntime.ai
MIT License
14.73k stars 2.94k forks source link

[ONNXRuntimeError] Non-zero status code returned while running SkipLayerNormalization node. #4779

Open wppply opened 4 years ago

wppply commented 4 years ago

Describe the bug I am trying to follow this tutorial to transfer my 2 layer bert into ONNX and optimize with onnxruntime_tools. It works smoothly when I transfer my tf model from .pb to .onnx.

Urgency If there are particular important use cases blocked by this or strict project-related timelines, please share more information and dates. If there are no hard deadlines, please specify none.

System information

To Reproduce I follow this tutorial https://github.com/microsoft/onnxruntime/blob/master/onnxruntime/python/tools/transformers/notebooks/Tensorflow_Keras_Bert-Squad_OnnxRuntime_CPU.ipynb it works well for tf epxorted model --> export ONNX model --> inference --> export Optimized ONNX mdoel.

! python -m tf2onnx.convert --saved-model /Users/mye29/Downloads/tmp_tiny_bert/export/1597187163/ --opset=10 --output=model.onnx
length = 32
input_ids = np.array([[128] * length], dtype=np.int32)
input_mask = np.array([[1] * length], dtype=np.int32)
segment_ids = np.array([[1] * length], dtype=np.int32)
label_id = [0]

inputs_onnx = {"input_ids_1:0": input_ids, 
               "input_mask_1:0": input_mask, 
               "segment_ids_1:0": segment_ids, 
               "label_ids_1:0": label_id}

sess_options = onnxruntime.SessionOptions()
session = onnxruntime.InferenceSession("model.onnx", sess_options, providers=['CPUExecutionProvider'])

total_runs = 1000
start = time.time()
for _ in range(total_runs):
    results = session.run(None, inputs_onnx)
end = time.time()
print("ONNX Runtime cpu inference time for sequence length {} (model not optimized): {} ms".format(
    32, format((end - start) * 1000 / total_runs, '.2f')))

However it doesnt work after I optimize_model

optimized_model_path = 'tf_{}_opt_cpu.onnx'.format("model")

from onnxruntime_tools import optimizer
optimized_model = optimizer.optimize_model("model.onnx", 
                                           model_type='bert_tf', 
                                           opt_level=1,
                                           num_heads=2, hidden_size=128)
optimized_model.use_dynamic_axes()
optimized_model.save_model_to_file(optimized_model_path)

the optimization remove one redundant input "label_ids_1:0"

length = 32
input_ids = np.array([[128] * length], dtype=np.int32)
input_mask = np.array([[1] * length], dtype=np.int32)
segment_ids = np.array([[1] * length], dtype=np.int32)

inputs_onnx = {"input_ids_1:0": input_ids, 
               "input_mask_1:0": input_mask, 
               "segment_ids_1:0": segment_ids}

The following step would give me error on CPU

sess_options = onnxruntime.SessionOptions()
# sess_options.graph_optimization_level = onnxruntime.GraphOptimizationLevel.ORT_ENABLE_ALL

session = onnxruntime.InferenceSession(optimized_model_path, sess_options)
# use one run to warm up a session
session.run(None, inputs_onnx)

# measure the latency.
start = time.time()
for _ in range(total_runs):
    opt_results = session.run(None, inputs_onnx)
end = time.time()
print("ONNX Runtime cpu inference time on optimized model: {} ms".format(format((end - start) * 1000 / total_runs, '.2f')))
del session
      4 session = onnxruntime.InferenceSession(optimized_model_path, sess_options)
      5 # use one run to warm up a session
----> 6 session.run(None, inputs_onnx)
      7 
      8 # measure the latency.

/anaconda3/envs/tf115/lib/python3.7/site-packages/onnxruntime/capi/session.py in run(self, output_names, input_feed, run_options)
    108             output_names = [output.name for output in self._outputs_meta]
    109         try:
--> 110             return self._sess.run(output_names, input_feed, run_options)
    111         except C.EPFail as err:
    112             if self._enable_fallback:

InvalidArgument: [ONNXRuntimeError] : 2 : INVALID_ARGUMENT : Non-zero status code returned while running SkipLayerNormalization node. Name:'SkipLayerNorm_AddBias_6' Status Message: input is expected to have 3 dimensions, got 2

I uploaded my model here https://drive.google.com/drive/folders/1S7ekooSbXAu6UuyynW5RyGmL1FKtoYqh?usp=sharing

Expected behavior Expect to give me a loss like non-optimized one, and much faster 👍 Screenshots If applicable, add screenshots to help explain your problem.

Additional context Add any other context about the problem here. If the issue is about a particular model, please share the model details as well to facilitate debugging.

hariharans29 commented 4 years ago

Looks like there is some bug in the optimizer script. @tianleiwu

colourful-tree commented 4 years ago

@wppply @hariharans29 https://github.com/microsoft/onnxruntime/blob/fff0b41fcb7cb85321f65d68c1dedaaf7032fcb0/onnxruntime/python/tools/transformers/onnx_model_bert.py#L91

Remove [line 91-92], it work for me.

And i also change :https://github.com/microsoft/onnxruntime/blob/fff0b41fcb7cb85321f65d68c1dedaaf7032fcb0/onnxruntime/python/tools/transformers/onnx_model_bert.py#L159 to if input.name in ["segment_ids:0", "input_mask:0", "input_ids:0"]:

But i find that model.optimizer.onnx this model does not faster than model.onnx.

My tinybert model is 2-transformer layers with 12 heads and 120 hidden_dim.

tianleiwu commented 4 years ago

@wppply,

Thanks for reporting the issue.

The cause of the error is a path in ONNX graph like the following:

SkipLayerNormalization (SkipLayerNorm1) --> Reshape (bert/encoder/Reshape_1) --> SkipLayerNormalization (
SkipLayerNorm_AddBias_6)

The correct one:

SkipLayerNormalization (SkipLayerNorm1) --> SkipLayerNormalization (
SkipLayerNorm_AddBias_6)

For normal BERT graph, the Reshape will be removed in postprocess. However, for this model, the optimizer failed to fuse Attention and EmbedLayerNormalization (because subgraph pattern is different) so the Reshape node has not been removed.

wppply commented 4 years ago

@tianleiwu Thanks for reply. Will this issue be fixed in the next release?

GumpCode commented 4 years ago

meet the same error when i was used bert-base

stevewyl commented 3 years ago

After changing the codes @colourful-tree referred to, still got the same error. Package version: Using tensorflow=1.12.0, onnx=1.8.0, tf2onnx=1.7.2/995bd6, onnxruntime-noopenmp=1.6.0

process_embedding: Create Embedding node prune_graph: Graph pruned: 0 inputs, 0 outputs and 34 nodes are removed fuse_mask_2: Failed to fuse mask apply: Fused SkipLayerNormalization count: 24 prune_graph: Graph pruned: 0 inputs, 0 outputs and 0 nodes are removed apply: Fused FastGelu(add bias) count: 12 apply: Fused SkipLayerNormalization(add bias) count: 24 optimize: opset verion: 11

wppply commented 3 years ago

add this option in the optimizer.optimize_model will help to solve the issue optimization_options=BertOptimizationOptions("gpt2") or manually make the change the option enable_skip_layer_norm = False

stale[bot] commented 2 years ago

This issue has been automatically marked as stale due to inactivity and will be closed in 7 days if no further activity occurs. If further support is needed, please provide an update and/or more details.

KokinSok commented 8 months ago

same error here - Onnx is becoming stale!