huggingface / transformers

πŸ€— Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
129.04k stars 25.59k forks source link

Got `ONNXRuntimeError` when try to run BART in ONNX format #12851

Open ryangawei opened 2 years ago

ryangawei commented 2 years ago

Environment info

Who can help

@mfuntowicz

To reproduce

I was using Google Colab and trying to export model facebook/bart-large-cnn to the onnx format. I ran the command python -m transformers.onnx -m=facebook/bart-large-cnn onnx/bart-large-cnn, and the outputs seem okay.

2021-07-22 23:14:33.821472: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
Using framework PyTorch: 1.9.0+cu102
Overriding 1 configuration item(s)
    - use_cache -> False
/usr/local/lib/python3.7/dist-packages/transformers/models/bart/modeling_bart.py:212: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if attn_weights.size() != (bsz * self.num_heads, tgt_len, src_len):
/usr/local/lib/python3.7/dist-packages/transformers/models/bart/modeling_bart.py:218: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if attention_mask.size() != (bsz, 1, tgt_len, src_len):
/usr/local/lib/python3.7/dist-packages/transformers/models/bart/modeling_bart.py:249: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if attn_output.size() != (bsz * self.num_heads, tgt_len, self.head_dim):
/usr/local/lib/python3.7/dist-packages/transformers/models/bart/modeling_bart.py:863: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if input_shape[-1] > 1:
tcmalloc: large alloc 1625399296 bytes == 0x5595ce83a000 @  0x7f1780d9f887 0x7f177f695c29 0x7f177f696afb 0x7f177f696bb4 0x7f177f696f9c 0x7f17670dcbb7 0x7f17670dd064 0x7f175b75ba1c 0x7f176bf8eaff 0x7f176b949b88 0x55949fda8bf8 0x55949fe1c6f2 0x55949fe16c35 0x55949fda973a 0x55949fe1893b 0x55949fe16c35 0x55949fda973a 0x55949fe1bf40 0x55949fe16c35 0x55949fda973a 0x55949fe1893b 0x55949fda965a 0x55949fe17b0e 0x55949fda965a 0x55949fe17b0e 0x55949fe16c35 0x55949fe16933 0x55949fe14da0 0x55949fda7ea9 0x55949fda7da0 0x55949fe1bbb3
tcmalloc: large alloc 1625399296 bytes == 0x55962f654000 @  0x7f1780d9f887 0x7f177f695c29 0x7f177f696afb 0x7f177f696bb4 0x7f177f696f9c 0x7f17670dcbb7 0x7f17670dd064 0x7f175b75ba1c 0x7f176bf8ecab 0x7f176b949b88 0x55949fda8bf8 0x55949fe1c6f2 0x55949fe16c35 0x55949fda973a 0x55949fe1893b 0x55949fe16c35 0x55949fda973a 0x55949fe1bf40 0x55949fe16c35 0x55949fda973a 0x55949fe1893b 0x55949fda965a 0x55949fe17b0e 0x55949fda965a 0x55949fe17b0e 0x55949fe16c35 0x55949fe16933 0x55949fe14da0 0x55949fda7ea9 0x55949fda7da0 0x55949fe1bbb3
tcmalloc: large alloc 1625399296 bytes == 0x5595ce83a000 @  0x7f1780d9d1e7 0x55949fdd9a18 0x55949fda4987 0x7f176bf8ece2 0x7f176b949b88 0x55949fda8bf8 0x55949fe1c6f2 0x55949fe16c35 0x55949fda973a 0x55949fe1893b 0x55949fe16c35 0x55949fda973a 0x55949fe1bf40 0x55949fe16c35 0x55949fda973a 0x55949fe1893b 0x55949fda965a 0x55949fe17b0e 0x55949fda965a 0x55949fe17b0e 0x55949fe16c35 0x55949fe16933 0x55949fe14da0 0x55949fda7ea9 0x55949fda7da0 0x55949fe1bbb3 0x55949fe16c35 0x55949fda973a 0x55949fe17b0e 0x55949fe16c35 0x55949fce8eb1
tcmalloc: large alloc 1625399296 bytes == 0x55962f654000 @  0x7f1780d9f887 0x7f177f695c29 0x7f177f695d47 0x7f177f6977a5 0x7f176bd60368 0x7f176bfbc844 0x7f176b949b88 0x55949fda8010 0x55949fda7da0 0x55949fe1bbb3 0x55949fe16c35 0x55949fda973a 0x55949fe1893b 0x55949fe16c35 0x55949fda973a 0x55949fe1bf40 0x55949fe16c35 0x55949fda973a 0x55949fe1893b 0x55949fda965a 0x55949fe17b0e 0x55949fda965a 0x55949fe17b0e 0x55949fe16c35 0x55949fe16933 0x55949fe14da0 0x55949fda7ea9 0x55949fda7da0 0x55949fe1bbb3 0x55949fe16c35 0x55949fda973a
Validating ONNX model...
    -[βœ“] ONNX model outputs' name match reference model ({'last_hidden_state', 'encoder_last_hidden_state'}
    - Validating ONNX Model output "last_hidden_state":
        -[βœ“] (2, 8, 1024) matchs (2, 8, 1024)
        -[βœ“] all values close (atol: 0.0001)
    - Validating ONNX Model output "encoder_last_hidden_state":
        -[βœ“] (2, 8, 1024) matchs (2, 8, 1024)
        -[βœ“] all values close (atol: 0.0001)
All good, model saved at: onnx/bart-large-cnn/model.onnx

Then I tried to execute the model in onnxruntime,

import onnxruntime as ort

ort_session = ort.InferenceSession('onnx/bart-large-cnn/model.onnx')

# Got input_ids and attention_mask using tokenizer

outputs = ort_session.run(None, {'input_ids': input_ids.detach().cpu().numpy(), 'attention_mask': attention_mask.detach().cpu().numpy()})

And I got the error,

---------------------------------------------------------------------------
RuntimeException                          Traceback (most recent call last)
<ipython-input-30-380e6a0e1085> in <module>()
----> 1 outputs = ort_session.run(None, {'input_ids': input_ids.detach().cpu().numpy(), 'attention_mask': attention_mask.detach().cpu().numpy()})

/usr/local/lib/python3.7/dist-packages/onnxruntime/capi/onnxruntime_inference_collection.py in run(self, output_names, input_feed, run_options)
    186             output_names = [output.name for output in self._outputs_meta]
    187         try:
--> 188             return self._sess.run(output_names, input_feed, run_options)
    189         except C.EPFail as err:
    190             if self._enable_fallback:

RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Non-zero status code returned while running Reshape node. Name:'Reshape_109' Status Message: /onnxruntime_src/onnxruntime/core/providers/cpu/tensor/reshape_helper.h:42 onnxruntime::ReshapeHelper::ReshapeHelper(const onnxruntime::TensorShape&, std::vector<long int>&, bool) gsl::narrow_cast<int64_t>(input_shape.Size()) == size was false. The input tensor cannot be reshaped to the requested shape. Input shape:{2}, requested shape:{1,1}

I see that BART is recently supported for ONNX in the latest release, but there isn't any code to exactly explain how to run the inference in onnxruntime. Maybe I'm doing something wrong here, so any help will be appreciated!

LysandreJik commented 2 years ago

I can reproduce in latest transformers with latest onnx runtime.

LysandreJik commented 2 years ago

FYI this error seems to be linked to the dimension of the input; if you use a batch size 2 it should work.

As seen with @mfuntowicz offline, we'll be working on a fix in the coming weeks cc @michaelbenayoun

ryangawei commented 2 years ago

@LysandreJik Thank you for the follow-up. I'll pay attention to any updates.

oborchers commented 2 years ago

Can reproduce with valhalla/distilbart-mnli-12-1 in 4.10.0. @LysandreJik The export is essentially dependent on the number of hypotheses it was exported with, as far as I can tell.

talent404 commented 2 years ago

Any update on this? Can reproduce the same for facebook/bart-large-mnli. Works only with a batch size of 2 during inference. @LysandreJik @mfuntowicz

Avi-avidan commented 2 years ago

transformers.version == 4.20.0.dev0 onnxruntime.version == 1.11.1

exported facebook/bart-base successfully , following instructions on - https://github.com/huggingface/transformers/tree/main/examples/research_projects/onnx/summarization

script output -

2022-05-16 16:06:57 | INFO | main | [run_onnx_exporter.py:163] Model outputs from torch and ONNX Runtime are similar. 2022-05-16 16:06:57 | INFO | main | [run_onnx_exporter.py:164] Success.

however, loading the exported model fails after it hangs forever (timing out), using this script -

import torch
from onnxruntime import InferenceSession, SessionOptions, GraphOptimizationLevel

options = SessionOptions() # initialize session options
options.graph_optimization_level = GraphOptimizationLevel.ORT_ENABLE_ALL

session = InferenceSession(
    'optimized_BART.onnx',
    sess_options=options,
    providers=["CPUExecutionProvider"]
)

session.disable_fallback()

(py39) user@Avis-MacBook-Pro-2 summarization % ls -lht -rw-r--r-- 1 user staff 680M May 16 16:06 optimized_BART.onnx

exported model size about 680MB

any advice on this?

Avi-avidan commented 2 years ago

transformers.version == 4.20.0.dev0 onnxruntime.version == 1.11.1

onnx bart fails to load (hangs forever) when passing options to InferenceSession()

avoid these - options.graph_optimization_level = GraphOptimizationLevel.ORT_ENABLE_ALL

otherwise loading the model hangs forever. upon keyboard interrupt, I am getting tons of these warnings -

2022-05-16 15:57:35.009102 [W:onnxruntime:, graph.cc:3559 CleanUnusedInitializersAndNodeArgs] Removing initializer '1772'. It is not used by any node and should be removed from the model. 2022-05-16 15:57:36.410981 [W:onnxruntime:, constant_folding.cc:202 ApplyImpl] Unsupported output type of N11onnxruntime22SequenceTensorTypeBaseE. Can't constant fold SequenceEmpty node 'SequenceEmpty_5330' 2022-05-16 15:57:36.416645 [W:onnxruntime:, constant_folding.cc:202 ApplyImpl] Unsupported output type of N11onnxruntime22SequenceTensorTypeBaseE. Can't constant fold SequenceEmpty node 'SequenceEmpty_808' 2022-05-16 15:57:36.416741 [W:onnxruntime:, constant_folding.cc:202 ApplyImpl] Unsupported output type of N11onnxruntime22SequenceTensorTypeBaseE. Can't constant fold SequenceEmpty node 'SequenceEmpty_1' 2022-05-16 15:57:36.446512 [W:onnxruntime:, constant_folding.cc:202 ApplyImpl] Unsupported output type of N11onnxruntime22SequenceTensorTypeBaseE. Can't constant fold SequenceEmpty node 'SequenceEmpty_5128' 2022-05-16 15:57:37.813252 [W:onnxruntime:, graph.cc:3559 CleanUnusedInitializersAndNodeArgs] Removing initializer '3149'. It is not used by any node and should be removed from the model. 2022-05-16 15:57:37.813269 [W:onnxruntime:, graph.cc:3559 CleanUnusedInitializersAndNodeArgs] Removing initializer '2153'. It is not used by any node and should be removed from the model. ....

Avi-avidan commented 2 years ago

loaded the onnx model successfully without options.graph_optimization_level. fails to get a prediction :(

import onnxruntime as ort
import numpy as np

ort_session = ort.InferenceSession(
    'optimized_BART.onnx')

print(f'inputs: {[i.name for i in ort_session.get_inputs()]}')

feed_dict = summarizer.tokenizer(text)
feed_dict['num_beams'] = 4
feed_dict['max_length'] = 120
feed_dict['decoder_start_token_id'] = 2
feed_dict = {k: np.int64([v]) for k, v in feed_dict.items()}

for key in feed_dict:
    print(f'feed_dict key: {key}, shape: {feed_dict[key].shape}')

pred = session.run(None, feed_dict)

printout -

inputs: ['input_ids', 'attention_mask', 'num_beams', 'max_length', 'decoder_start_token_id'] feed_dict key: input_ids, shape: (1, 228) feed_dict key: attention_mask, shape: (1, 228) feed_dict key: num_beams, shape: (1,) feed_dict key: max_length, shape: (1,) feed_dict key: decoder_start_token_id, shape: (1,)

InvalidArgument Traceback (most recent call last) Input In [39], in <cell line: 11>() 8 for key in feed_dict: 9 print(f'feed_dict key: {key}, shape: {feed_dict[key].shape}') ---> 11 pred = session.run(['output_ids'], feed_dict)

File ~/envs/py39/lib/python3.9/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py:192, in Session.run(self, output_names, input_feed, run_options) 190 output_names = [output.name for output in self._outputs_meta] 191 try: --> 192 return self._sess.run(output_names, input_feed, run_options) 193 except C.EPFail as err: 194 if self._enable_fallback:

InvalidArgument: [ONNXRuntimeError] : 2 : INVALID_ARGUMENT : Got invalid dimensions for input: attention_mask for the following indices index: 1 Got: 228 Expected: 13 Please fix either the inputs or the model.

Avi-avidan commented 2 years ago

fails to export facebook/bart-large-cnn or , following instructions on - https://github.com/huggingface/transformers/tree/main/examples/research_projects/onnx/summarization

(py39) user@Avis-MacBook-Pro-2 summarization % python run_onnx_exporter.py --model_name_or_path facebook/bart-large-cnn Traceback (most recent call last): File "~/src/transformers/examples/research_projects/onnx/summarization/run_onnx_exporter.py", line 207, in main() File "~/src/transformers/examples/research_projects/onnx/summarization/run_onnx_exporter.py", line 184, in main model, tokenizer = load_model_tokenizer(args.model_name_or_path, device) File "~/src/transformers/examples/research_projects/onnx/summarization/run_onnx_exporter.py", line 93, in load_model_tokenizer huggingface_model = model_dict[model_name].from_pretrained(model_name).to(device) KeyError: 'facebook/bart-large-cnn'

same error when trying to export model lidiya/bart-base-samsum

any advice would be greatly appreciated. thanks.