Open ryangawei opened 2 years ago
I can reproduce in latest transformers
with latest onnx runtime.
FYI this error seems to be linked to the dimension of the input; if you use a batch size 2 it should work.
As seen with @mfuntowicz offline, we'll be working on a fix in the coming weeks cc @michaelbenayoun
@LysandreJik Thank you for the follow-up. I'll pay attention to any updates.
Can reproduce with valhalla/distilbart-mnli-12-1
in 4.10.0
. @LysandreJik
The export is essentially dependent on the number of hypotheses it was exported with, as far as I can tell.
Any update on this? Can reproduce the same for facebook/bart-large-mnli. Works only with a batch size of 2 during inference. @LysandreJik @mfuntowicz
transformers.version == 4.20.0.dev0 onnxruntime.version == 1.11.1
exported facebook/bart-base successfully , following instructions on - https://github.com/huggingface/transformers/tree/main/examples/research_projects/onnx/summarization
script output -
2022-05-16 16:06:57 | INFO | main | [run_onnx_exporter.py:163] Model outputs from torch and ONNX Runtime are similar. 2022-05-16 16:06:57 | INFO | main | [run_onnx_exporter.py:164] Success.
however, loading the exported model fails after it hangs forever (timing out), using this script -
import torch
from onnxruntime import InferenceSession, SessionOptions, GraphOptimizationLevel
options = SessionOptions() # initialize session options
options.graph_optimization_level = GraphOptimizationLevel.ORT_ENABLE_ALL
session = InferenceSession(
'optimized_BART.onnx',
sess_options=options,
providers=["CPUExecutionProvider"]
)
session.disable_fallback()
(py39) user@Avis-MacBook-Pro-2 summarization % ls -lht -rw-r--r-- 1 user staff 680M May 16 16:06 optimized_BART.onnx
exported model size about 680MB
any advice on this?
transformers.version == 4.20.0.dev0 onnxruntime.version == 1.11.1
onnx bart fails to load (hangs forever) when passing options to InferenceSession()
avoid these - options.graph_optimization_level = GraphOptimizationLevel.ORT_ENABLE_ALL
otherwise loading the model hangs forever. upon keyboard interrupt, I am getting tons of these warnings -
2022-05-16 15:57:35.009102 [W:onnxruntime:, graph.cc:3559 CleanUnusedInitializersAndNodeArgs] Removing initializer '1772'. It is not used by any node and should be removed from the model. 2022-05-16 15:57:36.410981 [W:onnxruntime:, constant_folding.cc:202 ApplyImpl] Unsupported output type of N11onnxruntime22SequenceTensorTypeBaseE. Can't constant fold SequenceEmpty node 'SequenceEmpty_5330' 2022-05-16 15:57:36.416645 [W:onnxruntime:, constant_folding.cc:202 ApplyImpl] Unsupported output type of N11onnxruntime22SequenceTensorTypeBaseE. Can't constant fold SequenceEmpty node 'SequenceEmpty_808' 2022-05-16 15:57:36.416741 [W:onnxruntime:, constant_folding.cc:202 ApplyImpl] Unsupported output type of N11onnxruntime22SequenceTensorTypeBaseE. Can't constant fold SequenceEmpty node 'SequenceEmpty_1' 2022-05-16 15:57:36.446512 [W:onnxruntime:, constant_folding.cc:202 ApplyImpl] Unsupported output type of N11onnxruntime22SequenceTensorTypeBaseE. Can't constant fold SequenceEmpty node 'SequenceEmpty_5128' 2022-05-16 15:57:37.813252 [W:onnxruntime:, graph.cc:3559 CleanUnusedInitializersAndNodeArgs] Removing initializer '3149'. It is not used by any node and should be removed from the model. 2022-05-16 15:57:37.813269 [W:onnxruntime:, graph.cc:3559 CleanUnusedInitializersAndNodeArgs] Removing initializer '2153'. It is not used by any node and should be removed from the model. ....
loaded the onnx model successfully without options.graph_optimization_level. fails to get a prediction :(
import onnxruntime as ort
import numpy as np
ort_session = ort.InferenceSession(
'optimized_BART.onnx')
print(f'inputs: {[i.name for i in ort_session.get_inputs()]}')
feed_dict = summarizer.tokenizer(text)
feed_dict['num_beams'] = 4
feed_dict['max_length'] = 120
feed_dict['decoder_start_token_id'] = 2
feed_dict = {k: np.int64([v]) for k, v in feed_dict.items()}
for key in feed_dict:
print(f'feed_dict key: {key}, shape: {feed_dict[key].shape}')
pred = session.run(None, feed_dict)
inputs: ['input_ids', 'attention_mask', 'num_beams', 'max_length', 'decoder_start_token_id'] feed_dict key: input_ids, shape: (1, 228) feed_dict key: attention_mask, shape: (1, 228) feed_dict key: num_beams, shape: (1,) feed_dict key: max_length, shape: (1,) feed_dict key: decoder_start_token_id, shape: (1,)
InvalidArgument Traceback (most recent call last) Input In [39], in <cell line: 11>() 8 for key in feed_dict: 9 print(f'feed_dict key: {key}, shape: {feed_dict[key].shape}') ---> 11 pred = session.run(['output_ids'], feed_dict)
File ~/envs/py39/lib/python3.9/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py:192, in Session.run(self, output_names, input_feed, run_options) 190 output_names = [output.name for output in self._outputs_meta] 191 try: --> 192 return self._sess.run(output_names, input_feed, run_options) 193 except C.EPFail as err: 194 if self._enable_fallback:
InvalidArgument: [ONNXRuntimeError] : 2 : INVALID_ARGUMENT : Got invalid dimensions for input: attention_mask for the following indices index: 1 Got: 228 Expected: 13 Please fix either the inputs or the model.
fails to export facebook/bart-large-cnn or , following instructions on - https://github.com/huggingface/transformers/tree/main/examples/research_projects/onnx/summarization
(py39) user@Avis-MacBook-Pro-2 summarization % python run_onnx_exporter.py --model_name_or_path facebook/bart-large-cnn
Traceback (most recent call last):
File "~/src/transformers/examples/research_projects/onnx/summarization/run_onnx_exporter.py", line 207, in
same error when trying to export model lidiya/bart-base-samsum
any advice would be greatly appreciated. thanks.
Environment info
transformers
version: 4.9.0Who can help
@mfuntowicz
To reproduce
I was using Google Colab and trying to export model
facebook/bart-large-cnn
to the onnx format. I ran the commandpython -m transformers.onnx -m=facebook/bart-large-cnn onnx/bart-large-cnn
, and the outputs seem okay.Then I tried to execute the model in
onnxruntime
,And I got the error,
I see that BART is recently supported for ONNX in the latest release, but there isn't any code to exactly explain how to run the inference in
onnxruntime
. Maybe I'm doing something wrong here, so any help will be appreciated!