remusao commented 12 months ago

System Info

Python 3.10.12
optimum[exporters]==1.13.2
Linux + Nvidia GPU with driver 525.85.12 (Although the issue seems independent from GPU)

Who can help?

@michaelbenayoun (Tagging you since the issue template suggests it for exports to ONNX)

Information

[X] The official example scripts
[ ] My own modified scripts

Tasks

[X] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[ ] My own task or dataset (give details below)

Reproduction (minimal, reproducible, runnable)

Install optimum:

$ pip install optimum[exporters]==1.13.2

Then try to convert facebook/bart-large-mnli to ONNX format:

$ optimum-cli export onnx --model facebook/bart-large-mnli onnx_output

I also tried the following commands which all fail in a similar way:

$ python3 -m transformers.onnx --model=facebook/bart-large-mnli --feature=sequence-classification ./onnx_output

Or,

$ optimum-cli export onnx \
    --model=facebook/bart-large-mnli \
    --task=text-classification \
    --device cuda \
    --opset 17 \
    --framework pt \
    ./onnx_output

I also tried to use opset 12, and remove the safetensor model to force conversion from torch tensors (not sure if that should make any difference).

The output is then:

Framework not specified. Using pt to export to ONNX.
Downloading (…)lve/main/config.json: 100%|████████████████████████████████████████████████████████████████████████████████| 1.15k/1.15k [00:00<00:00, 10.3MB/s]
Downloading model.safetensors: 100%|███████████████████████████████████████████████████████████████████████████████████████| 1.63G/1.63G [00:06<00:00, 248MB/s]
Automatic task detection to text-classification (possible synonyms are: sequence-classification, zero-shot-classification).
Downloading (…)okenizer_config.json: 100%|███████████████████████████████████████████████████████████████████████████████████| 26.0/26.0 [00:00<00:00, 234kB/s]
Downloading (…)olve/main/vocab.json: 100%|██████████████████████████████████████████████████████████████████████████████████| 899k/899k [00:00<00:00, 30.6MB/s]
Downloading (…)olve/main/merges.txt: 100%|██████████████████████████████████████████████████████████████████████████████████| 456k/456k [00:00<00:00, 33.9MB/s]
Downloading (…)/main/tokenizer.json: 100%|█████████████████████████████████████████████████████████████████████████████████| 1.36M/1.36M [00:00<00:00, 112MB/s]
Using the export variant default. Available variants are:
    - default: The default ONNX variant.
Using framework PyTorch: 2.1.0+cu121
Overriding 1 configuration item(s)
    - use_cache -> False
/usr/local/lib/python3.10/dist-packages/transformers/models/bart/modeling_bart.py:239: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if attn_weights.size() != (bsz * self.num_heads, tgt_len, src_len):
/usr/local/lib/python3.10/dist-packages/transformers/models/bart/modeling_bart.py:246: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if attention_mask.size() != (bsz, 1, tgt_len, src_len):
/usr/local/lib/python3.10/dist-packages/transformers/models/bart/modeling_bart.py:278: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if attn_output.size() != (bsz * self.num_heads, tgt_len, self.head_dim):
/usr/local/lib/python3.10/dist-packages/transformers/models/bart/modeling_bart.py:936: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if input_shape[-1] > 1:
/usr/local/lib/python3.10/dist-packages/transformers/models/bart/modeling_bart.py:1559: TracerWarning: Using len to get tensor shape might cause the trace to be incorrect. Recommended usage would be tensor.shape[0]. Passing a tensor of different shape might lead to errors or silently give incorrect results.
  if len(torch.unique_consecutive(eos_mask.sum(1))) > 1:
/usr/local/lib/python3.10/dist-packages/torch/onnx/utils.py:1686: UserWarning: The exported ONNX model failed ONNX shape inference. The model will not be executable by the ONNX Runtime. If this is unintended and you believe there is a bug, please report an issue at https://github.com/pytorch/pytorch/issues. Error reported by strict ONNX shape inference: [ShapeInferenceError] (op_type:Gather, node name: /model/decoder/embed_tokens/Gather): indices typestr: Tind, has unsupported type: tensor(float) (Triggered internally at ../torch/csrc/jit/serialization/export.cpp:1415.)
  _C._check_onnx_proto(proto)
Traceback (most recent call last):
  File "/usr/local/bin/optimum-cli", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.10/dist-packages/optimum/commands/optimum_cli.py", line 163, in main
    service.run()
  File "/usr/local/lib/python3.10/dist-packages/optimum/commands/export/onnx.py", line 232, in run
    main_export(
  File "/usr/local/lib/python3.10/dist-packages/optimum/exporters/onnx/__main__.py", line 486, in main_export
    _, onnx_outputs = export_models(
  File "/usr/local/lib/python3.10/dist-packages/optimum/exporters/onnx/convert.py", line 752, in export_models
    export(
  File "/usr/local/lib/python3.10/dist-packages/optimum/exporters/onnx/convert.py", line 883, in export
    config.fix_dynamic_axes(output, device=device, input_shapes=input_shapes, dtype=dtype)
  File "/usr/local/lib/python3.10/dist-packages/optimum/exporters/onnx/base.py", line 302, in fix_dynamic_axes
    session = InferenceSession(model_path.as_posix(), providers=providers, sess_options=session_options)
  File "/usr/local/lib/python3.10/dist-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 419, in __init__
    self._create_inference_session(providers, provider_options, disabled_optimizers)
  File "/usr/local/lib/python3.10/dist-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 452, in _create_inference_session
    sess = C.InferenceSession(session_options, self._model_path, True, self._read_config_from_model)
onnxruntime.capi.onnxruntime_pybind11_state.InvalidGraph: [ONNXRuntimeError] : 10 : INVALID_GRAPH : Load model from onnx_output2/model.onnx failed:This is an invalid model. Type Error: Type 'tensor(float)' of input parameter (/model/Where_4_output_0) of operator (Gather) in node (/model/decoder/embed_tokens/Gather) is invalid.

Expected behavior

The conversion should succeed and result in a valid ONNX model store in onnx_output folder. This used to work in older versions and as far as I can see the model hosted on the hub has not been updated for a long time, apart from the addition of safetensors (I have not tracked which version of torch/transformers/onnxruntime broke the conversion).

fxmarty commented 12 months ago

Thank you @remusao . I believe this a PyTorch bug introduced in PyTorch 2.1, for reference https://github.com/pytorch/pytorch/issues/110597 https://github.com/pytorch/pytorch/pull/111694

We temporarily disabled the support due to this issue: https://github.com/huggingface/optimum/pull/1457

Could you try again with PyTorch 2.0.1?

remusao commented 12 months ago

Thanks for the quick response @fxmarty; pinning PyTorch to 2.0.1 fixes the conversion.

remusao commented 12 months ago

@fxmarty Should I close this issue or would you like to keep it open while the regression in PyTorch is addressed?

fxmarty commented 12 months ago

It's better to let it open I think. Instead of disabling the export like I did we should put a requirement on torch<2.1,>=2.2

stekiri commented 8 months ago

Just stumbled upon this 😢 @fxmarty, are there any plans for a patch release that would include the merged fix?

fxmarty commented 8 months ago

Thank you @stekiri, we will do a release this week that includes https://github.com/huggingface/optimum/pull/1666 that fixes this issue for torch>=2.1.2. In the meantime you can use the dev version of optimum: pip uninstall optimum && pip install git+https://github.com/huggingface/optimum.git.

optimum-cli export onnx --model facebook/bart-large-mnli bart_onnx --task sequence-classification works again.

akashAD98 commented 7 months ago

@remusao can you send inference script of converted model?

fxmarty commented 7 months ago

@akashAD98 You should be able to use: https://huggingface.co/docs/optimum/main/en/onnxruntime/package_reference/modeling_ort#optimum.onnxruntime.ORTModelForSequenceClassification

https://huggingface.co/docs/optimum/main/en/onnxruntime/usage_guides/models is a good reference.

akashAD98 commented 7 months ago

im getting the error i tried below code

@fxmarty

!pip install torch==2.1.2

!pip install --upgrade-strategy eager install optimum[onnxruntime]

!optimum-cli export onnx  --task zero-shot-classification --model facebook/bart-large-mnli bart-large-mnli_onnx_zs_model

log: 2024-03-01 07:33:03.428142: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered 2024-03-01 07:33:03.428217: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered 2024-03-01 07:33:03.434604: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered 2024-03-01 07:33:05.711962: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT Framework not specified. Using pt to export the model. Using the export variant default. Available variants are:

default: The default ONNX variant. Some non-default generation parameters are set in the model config. These should go into a GenerationConfig file (https://huggingface.co/docs/transformers/generation_strategies#save-a-custom-decoding-strategy-with-your-model) instead. This warning will be raised to an exception in v4.41. Non-default generation parameters: {'forced_eos_token_id': 2} Using framework PyTorch: 2.1.2+cu121 Overriding 1 configuration item(s)
use_cache -> False /usr/local/lib/python3.10/dist-packages/transformers/models/bart/modeling_bart.py:239: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if attn_weights.size() != (bsz self.num_heads, tgt_len, src_len): /usr/local/lib/python3.10/dist-packages/transformers/models/bart/modeling_bart.py:246: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if attention_mask.size() != (bsz, 1, tgt_len, src_len): /usr/local/lib/python3.10/dist-packages/transformers/models/bart/modeling_bart.py:278: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if attn_output.size() != (bsz self.num_heads, tgt_len, self.head_dim): /usr/local/lib/python3.10/dist-packages/transformers/modeling_attn_mask_utils.py:86: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if input_shape[-1] > 1 or self.sliding_window is not None: /usr/local/lib/python3.10/dist-packages/transformers/modeling_attn_mask_utils.py:162: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if past_key_values_length > 0: /usr/local/lib/python3.10/dist-packages/transformers/models/bart/modeling_bart.py:1911: TracerWarning: Using len to get tensor shape might cause the trace to be incorrect. Recommended usage would be tensor.shape[0]. Passing a tensor of different shape might lead to errors or silently give incorrect results. if len(torch.unique_consecutive(eos_mask.sum(1))) > 1: Post-processing the exported models... Weight deduplication check in the ONNX export requires accelerate. Please install accelerate to run it. Validating ONNX model bart-large-mnli_onnx_zs_model/model.onnx... -[✓] ONNX model output names match reference model (logits)
Validating ONNX Model output "logits": -[✓] (2, 3) matches (2, 3) -[x] values not close enough, max diff: 4.228949546813965e-05 (atol: 1e-05) The ONNX export succeeded with the warning: The maximum absolute difference between the output of the reference model and the ONNX exported model is not within the set tolerance 1e-05:
- logits: max diff = 4.228949546813965e-05. The exported model was saved at: bart-large-mnli_onnx_zs_model

from transformers import AutoTokenizer,pipeline
from optimum.onnxruntime import ORTModelForQuestionAnswering

tokenizer = AutoTokenizer.from_pretrained("bart-large-mnli_onnx_zs_model")
model = ORTModelForQuestionAnswering.from_pretrained("bart-large-mnli_onnx_zs_model")
inputs = tokenizer("What am I using?", "Using DistilBERT with ONNX Runtime!", return_tensors="pt")
outputs = model(**inputs)

error:

KeyError Traceback (most recent call last) in <cell line: 18>() 16 sequence_to_classify = "Who are you voting for in 2020?" 17 candidate_labels = ["Europe", "public health", "politics", "elections"] ---> 18 pred = onnx_z0(sequence_to_classify, candidate_labels) 19 pred

7 frames /usr/local/lib/python3.10/dist-packages/optimum/onnxruntime/modeling_ort.py in forward(self, input_ids, attention_mask, token_type_ids, **kwargs) 1258 outputs = self.model.run(None, onnx_inputs) 1259 -> 1260 start_logits = outputs[self.output_names["start_logits"]] 1261 end_logits = outputs[self.output_names["end_logits"]] 1262 if use_torch:

KeyError: 'start_logits'

fxmarty commented 7 months ago

@akashAD98 Answered on https://github.com/huggingface/optimum/issues/1739

huggingface / optimum

Error while converting bart-large-mnli to ONNX format #1485

System Info

Who can help?

Information

Tasks

Reproduction (minimal, reproducible, runnable)

Expected behavior

!pip install torch==2.1.2

!pip install --upgrade-strategy eager install optimum[onnxruntime]