microsoft / onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
https://onnxruntime.ai
MIT License
14.37k stars 2.88k forks source link

Tranformers benchmark.py failed with facebook/bart_base #13397

Open skyline75489 opened 1 year ago

skyline75489 commented 1 year ago

Describe the issue

 python benchmark.py -m facebook/bart-base -g -p fp16
Arguments: Namespace(models=['facebook/bart-base'], model_source='pt', model_class=None, engines=['onnxruntime'], cache_dir='./cache_models', onnx_dir='./onnx_models', use_gpu=True, provider=None, precision=<Precision.FLOAT16: 'fp16'>, verbose=False, overwrite=False, optimizer_info=<OptimizerInfo.BYSCRIPT: 'by_script'>, validate_onnx=False, fusion_csv=None, detail_csv=None, result_csv=None, input_counts=[1], test_times=100, batch_sizes=[1], sequence_lengths=[4, 8, 16, 32, 64, 128, 256], disable_ort_io_binding=False, num_threads=[6], force_num_layers=None, disable_attention=False, disable_skip_layer_norm=False, disable_embed_layer_norm=False, disable_bias_skip_layer_norm=False, disable_bias_gelu=False, disable_layer_norm=False, disable_gelu=False, enable_gelu_approximation=False, disable_shape_inference=False, use_mask_index=False, no_attention_mask=False)
Model class name: AutoModel
Skip export since model existed: ./onnx_models/facebook_bart_base_1.onnx
Skip optimization since model existed: ./onnx_models/facebook_bart_base_1_fp16_gpu.onnx
Run onnxruntime on facebook/bart-base with input shape [1, 4]
Exception
Traceback (most recent call last):
  File "/datadrive/azureuser/onnxruntime/onnxruntime/python/tools/transformers/benchmark.py", line 841, in main
    results += run_onnxruntime(
  File "/datadrive/azureuser/onnxruntime/onnxruntime/python/tools/transformers/benchmark.py", line 273, in run_onnxruntime
    ort_outputs = ort_session.run(ort_output_names, ort_inputs)
  File "/datadrive/azureuser/miniconda3/lib/python3.9/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 200, in run
    return self._sess.run(output_names, input_feed, run_options)
onnxruntime.capi.onnxruntime_pybind11_state.InvalidArgument: [ONNXRuntimeError] : 2 : INVALID_ARGUMENT : Invalid Feed Input Name:decoder_input_ids
No any result avaiable.

To reproduce

 python benchmark.py -m facebook/bart-base -g -p fp16

Urgency

No response

Platform

Linux

OS Version

Ubuntu 20.04

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.12.1

ONNX Runtime API

Python

Architecture

X64

Execution Provider

CUDA

Execution Provider Library Version

CUDA 11.6

tianleiwu commented 1 year ago

@skyline75489, Thanks for the feedback. I can reproduce the issue. We'll update the benchmark to only measure the encoder of bart.

For text generation, it is better to use end to end performance test. The decoding latency depending on multiple factors: batch size, context sequence length, number of generated tokens, beam search/ greedy search/ beam sampling, early stop etc. I suggest to modify the following script if you need consider all these factors: https://github.com/microsoft/onnxruntime/tree/main/onnxruntime/python/tools/transformers/models/bart The script exports model to onnx, and runs some example text generation. Only need small change to measure latency.