huggingface / optimum

🚀 Accelerate training and inference of 🤗 Transformers and 🤗 Diffusers with easy to use hardware optimization tools
https://huggingface.co/docs/optimum/main/
Apache License 2.0
2.56k stars 464 forks source link

Pegasus export error #665

Open bhavnicksm opened 1 year ago

bhavnicksm commented 1 year ago

System Info

Run on a general Google colab notbook with the following installations:

!pip install --quiet transformers
!python -m pip install git+https://github.com/huggingface/optimum.git#egg=optimum[onnxruntime-gpu]

Some PyPi module versions:

onnx                              1.12.0
onnxruntime-gpu           1.13.1
optimum                        1.6.2.dev0
transformers                  4.25.1
torch                              1.13.0+cu116
torchaudio                     0.13.0+cu116
torchsummary               1.5.1
torchtext                        0.14.0
torchvision                     0.14.0+cu116

Who can help?

@lewtun @michaelbenayoun

Information

Tasks

Reproduction

You can replicate the issue by running the following code, post installation of dependencies on a colab notebook


from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
from optimum.onnxruntime import ORTModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained("tuner007/pegasus_paraphrase")
model = AutoModelForSeq2SeqLM.from_pretrained("tuner007/pegasus_paraphrase")

ort_model = ORTModelForSeq2SeqLM.from_pretrained("tuner007/pegasus_paraphrase", from_transformers=True)

Expected behavior

Ideally, you want to ORTModel to be loaded up after the code execution.

But it gives out the following error:

/usr/local/lib/python3.8/dist-packages/transformers/models/pegasus/modeling_pegasus.py:234: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if attn_weights.size() != (bsz * self.num_heads, tgt_len, src_len):
/usr/local/lib/python3.8/dist-packages/transformers/models/pegasus/modeling_pegasus.py:241: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if attention_mask.size() != (bsz, 1, tgt_len, src_len):
/usr/local/lib/python3.8/dist-packages/transformers/models/pegasus/modeling_pegasus.py:273: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if attn_output.size() != (bsz * self.num_heads, tgt_len, self.head_dim):
/usr/local/lib/python3.8/dist-packages/transformers/models/pegasus/modeling_pegasus.py:876: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if input_shape[-1] > 1:
/usr/local/lib/python3.8/dist-packages/transformers/models/pegasus/modeling_pegasus.py:83: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
  mask = torch.full((tgt_len, tgt_len), torch.tensor(torch.finfo(dtype).min))
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
[<ipython-input-4-2e0907dfd025>](https://localhost:8080/#) in <module>
----> 1 ort_model = ORTModelForSeq2SeqLM.from_pretrained("tuner007/pegasus_paraphrase", from_transformers=True)

9 frames
[/usr/local/lib/python3.8/dist-packages/optimum/onnxruntime/modeling_ort.py](https://localhost:8080/#) in from_pretrained(cls, model_id, from_transformers, force_download, use_auth_token, cache_dir, subfolder, config, local_files_only, provider, session_options, provider_options, **kwargs)
    555             `ORTModel`: The loaded ORTModel model.
    556         """
--> 557         return super().from_pretrained(
    558             model_id,
    559             from_transformers=from_transformers,

[/usr/local/lib/python3.8/dist-packages/optimum/modeling_base.py](https://localhost:8080/#) in from_pretrained(cls, model_id, from_transformers, force_download, use_auth_token, cache_dir, subfolder, config, local_files_only, **kwargs)
    323 
    324         from_pretrained_method = cls._from_transformers if from_transformers else cls._from_pretrained
--> 325         return from_pretrained_method(
    326             model_id=model_id,
    327             config=config,

[/usr/local/lib/python3.8/dist-packages/optimum/onnxruntime/modeling_seq2seq.py](https://localhost:8080/#) in _from_transformers(cls, model_id, config, use_auth_token, revision, force_download, cache_dir, subfolder, local_files_only, use_cache, provider, session_options, provider_options, use_io_binding, task)
   1144             output_names.append(ONNX_DECODER_WITH_PAST_NAME)
   1145         models_and_onnx_configs = get_encoder_decoder_models_for_export(model, onnx_config)
-> 1146         export_models(
   1147             models_and_onnx_configs=models_and_onnx_configs,
   1148             opset=onnx_config.DEFAULT_ONNX_OPSET,

[/usr/local/lib/python3.8/dist-packages/optimum/exporters/onnx/convert.py](https://localhost:8080/#) in export_models(models_and_onnx_configs, output_dir, opset, output_names, device, input_shapes)
    534 
    535         outputs.append(
--> 536             export(
    537                 model=submodel,
    538                 config=sub_onnx_config,

[/usr/local/lib/python3.8/dist-packages/optimum/exporters/onnx/convert.py](https://localhost:8080/#) in export(model, config, output, opset, device, input_shapes)
    605                 f" got: {torch.__version__}"
    606             )
--> 607         return export_pytorch(model, config, opset, output, device=device, input_shapes=input_shapes)
    608 
    609     elif is_tf_available() and issubclass(type(model), TFPreTrainedModel):

[/usr/local/lib/python3.8/dist-packages/optimum/exporters/onnx/convert.py](https://localhost:8080/#) in export_pytorch(model, config, opset, output, device, input_shapes)
    368             # Export can work with named args but the dict containing named args has to be the last element of the args
    369             # tuple.
--> 370             onnx_export(
    371                 model,
    372                 (dummy_inputs,),

[/usr/local/lib/python3.8/dist-packages/torch/onnx/utils.py](https://localhost:8080/#) in export(model, args, f, export_params, verbose, training, input_names, output_names, operator_export_type, opset_version, do_constant_folding, dynamic_axes, keep_initializers_as_inputs, custom_opsets, export_modules_as_functions)
    502     """
    503 
--> 504     _export(
    505         model,
    506         args,

[/usr/local/lib/python3.8/dist-packages/torch/onnx/utils.py](https://localhost:8080/#) in _export(model, args, f, export_params, verbose, training, input_names, output_names, operator_export_type, export_type, opset_version, do_constant_folding, dynamic_axes, keep_initializers_as_inputs, fixed_batch_size, custom_opsets, add_node_names, onnx_shape_inference, export_modules_as_functions)
   1527             _validate_dynamic_axes(dynamic_axes, model, input_names, output_names)
   1528 
-> 1529             graph, params_dict, torch_out = _model_to_graph(
   1530                 model,
   1531                 args,

[/usr/local/lib/python3.8/dist-packages/torch/onnx/utils.py](https://localhost:8080/#) in _model_to_graph(model, args, verbose, input_names, output_names, operator_export_type, do_constant_folding, _disable_torch_constant_prop, fixed_batch_size, training, dynamic_axes)
   1113 
   1114     try:
-> 1115         graph = _optimize_graph(
   1116             graph,
   1117             operator_export_type,

[/usr/local/lib/python3.8/dist-packages/torch/onnx/utils.py](https://localhost:8080/#) in _optimize_graph(graph, operator_export_type, _disable_torch_constant_prop, fixed_batch_size, params_dict, dynamic_axes, input_names, module)
    662 
    663     graph = _C._jit_pass_onnx(graph, operator_export_type)
--> 664     _C._jit_pass_onnx_lint(graph)
    665     _C._jit_pass_lint(graph)
    666 

RuntimeError: Unable to cast from non-held to held instance (T& to Holder<T>) (#define PYBIND11_DETAILED_ERROR_MESSAGES or compile in debug mode for type information)

EDIT: You can check out my Colab Notebook which reproduces the error here Google Colab (view only)

fxmarty commented 1 year ago

Thanks for the report @bhavnicksm!

Weirdly I can't reproduce: https://colab.research.google.com/drive/1loUO95dJ88KBxAGs6ZcrLF9uqSgiGN5t?usp=sharing

Could you try the cell below (aka optimum-cli export onnx --model tuner007/pegasus_paraphrase --for-ort --task seq2seq-lm-with-past pegasus_onnx) to see how it goes?

bhavnicksm commented 1 year ago

Hey @fxmarty, Yeah, I just tried it a few times and somehow encountered the same error 2 out of 5 times only, weird enough. I am trying to reproduce the error better.

bhavnicksm commented 1 year ago

@fxmarty, reproduced the same error on your notebook after just running the same code a few times and restarting the notebook.

image

The bug is quiet stochastic.