huggingface / optimum

🚀 Accelerate training and inference of 🤗 Transformers and 🤗 Diffusers with easy to use hardware optimization tools
https://huggingface.co/docs/optimum/main/
Apache License 2.0
2.54k stars 455 forks source link

Pix2struct to ONNX execution error #1113

Closed arvisioncode closed 1 year ago

arvisioncode commented 1 year ago

System Info

Working on google colab, and installing optimum with: 
!python -m pip install git+https://github.com/huggingface/optimum.git#egg=optimum[onnxruntime]

Who can help?

Hi @fxmarty ! First of all, thank you for your work on implementing the pix2struct conversion here. I'm trying to run the conversion following the commands of the main page, but I am having some problems...

I have done the installation of optimum from the repositories as explained before, and to run the transformation I have try the following commands:

!optimum-cli export onnx -m fxmarty/pix2struct-tiny-random --optimize O2 fxmarty/pix2struct-tiny-random_onnx
!optimum-cli export onnx -m google/pix2struct-docvqa-base --optimize O2 pix2struct-docvqa-base_onnx
!optimum-cli export onnx -m google/pix2struct-base --optimize O2 pix2struct-base_onnx

...

These are the execution logs:

2023-06-15 08:14:28.116454: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
Framework not specified. Using pt to export to ONNX.
Automatic task detection to image-to-text-with-past.
Downloading (…)okenizer_config.json: 100% 2.45k/2.45k [00:00<00:00, 9.20MB/s]
Downloading spiece.model: 100% 851k/851k [00:00<00:00, 1.87MB/s]
Downloading (…)/main/tokenizer.json: 100% 3.27M/3.27M [00:00<00:00, 6.58MB/s]
Downloading (…)cial_tokens_map.json: 100% 2.20k/2.20k [00:00<00:00, 8.83MB/s]
Downloading (…)rocessor_config.json: 100% 250/250 [00:00<00:00, 1.05MB/s]
Using framework PyTorch: 2.0.1+cu118
/usr/local/lib/python3.10/dist-packages/transformers/models/pix2struct/modeling_pix2struct.py:221: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
  scores = torch.max(scores, torch.tensor(torch.finfo(scores.dtype).min))
============= Diagnostic Run torch.onnx.export version 2.0.1+cu118 =============
verbose: False, log level: Level.ERROR
======================= 0 NONE 0 NOTE 0 WARNING 0 ERROR ========================

Using framework PyTorch: 2.0.1+cu118
/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py:832: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if causal_mask.shape[1] < attention_mask.shape[1]:
============= Diagnostic Run torch.onnx.export version 2.0.1+cu118 =============
verbose: False, log level: Level.ERROR
======================= 0 NONE 0 NOTE 0 WARNING 0 ERROR ========================

Traceback (most recent call last):
  File "/usr/local/bin/optimum-cli", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.10/dist-packages/optimum/commands/optimum_cli.py", line 163, in main
    service.run()
  File "/usr/local/lib/python3.10/dist-packages/optimum/commands/export/onnx.py", line 219, in run
    main_export(
  File "/usr/local/lib/python3.10/dist-packages/optimum/exporters/onnx/__main__.py", line 309, in main_export
    _, onnx_outputs = export_models(
  File "/usr/local/lib/python3.10/dist-packages/optimum/exporters/onnx/convert.py", line 613, in export_models
    export(
  File "/usr/local/lib/python3.10/dist-packages/optimum/exporters/onnx/convert.py", line 709, in export
    export_output = export_pytorch(
  File "/usr/local/lib/python3.10/dist-packages/optimum/exporters/onnx/convert.py", line 442, in export_pytorch
    onnx_export(
  File "/usr/local/lib/python3.10/dist-packages/torch/onnx/utils.py", line 506, in export
    _export(
  File "/usr/local/lib/python3.10/dist-packages/torch/onnx/utils.py", line 1548, in _export
    graph, params_dict, torch_out = _model_to_graph(
  File "/usr/local/lib/python3.10/dist-packages/torch/onnx/utils.py", line 1113, in _model_to_graph
    graph, params, torch_out, module = _create_jit_graph(model, args)
  File "/usr/local/lib/python3.10/dist-packages/torch/onnx/utils.py", line 989, in _create_jit_graph
    graph, torch_out = _trace_and_get_graph_from_model(model, args)
  File "/usr/local/lib/python3.10/dist-packages/torch/onnx/utils.py", line 893, in _trace_and_get_graph_from_model
    trace_graph, torch_out, inputs_states = torch.jit._get_trace_graph(
  File "/usr/local/lib/python3.10/dist-packages/torch/jit/_trace.py", line 1268, in _get_trace_graph
    outs = ONNXTracedModule(f, strict, _force_outplace, return_inputs, _return_inputs_states)(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/jit/_trace.py", line 127, in forward
    graph, out = torch._C._create_graph_by_tracing(
  File "/usr/local/lib/python3.10/dist-packages/torch/jit/_trace.py", line 118, in wrapper
    outs.append(self.inner(*trace_inputs))
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1488, in _slow_forward
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/optimum/exporters/onnx/model_patcher.py", line 129, in patched_forward
    outputs = self.orig_forward(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/transformers/models/pix2struct/modeling_pix2struct.py", line 1761, in forward
    encoder_last_hidden_state=encoder_outputs.last_hidden_state,
AttributeError: 'tuple' object has no attribute 'last_hidden_state'

Is this the correct way to make the conversion to ONNX? Can you help me with this problem?

Thank you so much in advance! :)

Information

Tasks

Reproduction

!python -m pip install git+https://github.com/huggingface/optimum.git#egg=optimum[onnxruntime]
!optimum-cli export onnx -m fxmarty/pix2struct-tiny-random --optimize O2 fxmarty/pix2struct-tiny-random_onnx

Expected behavior

Model in ONNX

fxmarty commented 1 year ago

Can you run pip uninstall transformers && pip install transformers? This fix https://github.com/huggingface/transformers/pull/23932 and thus transformers>=4.30 is required for pix2struct export.

arvisioncode commented 1 year ago

I did it and this is the output now:

NotImplementedError: ONNX Runtime doesn't support the graph optimization of pix2struct yet. Only ['albert', 'bart', 'bert', 'big_bird', 'blenderbot', 'bloom', 'camembert', 'codegen', 'deberta', 'deberta-v2', 'distilbert', 'electra', 'gpt2', 'gpt_neo', 'gpt_neox', 'gptj', 'longt5', 'llama', 'marian', 'mbart', 'mt5', 'm2m_100', 'nystromformer', 'pegasus', 'roberta', 't5', 'whisper', 'xlm-roberta'] are supported. If you want to support pix2struct please propose a PR or open up an issue in ONNX Runtime:https://github.com/microsoft/onnxruntime.

Maybe my optimum installation is not taking the last changes? But I think that these commands install from the master branch

!python -m pip install git+https://github.com/huggingface/optimum.git#egg=optimum[onnxruntime]
or
!python -m pip install git+https://github.com/huggingface/optimum.git#egg=optimum[exporters,onnxruntime]
fxmarty commented 1 year ago

If you remove --optimize O2 it should work!

arvisioncode commented 1 year ago

thank you!! it is working now yes!! :)

just another question... do you have an example for the inference of the model?

I see the code of the example:

from optimum.onnxruntime import ORTModelForQuestionAnswering
from transformers import AutoTokenizer, pipeline

model_id = "deepset/roberta-base-squad2"
tokenizer = AutoTokenizer.from_pretrained(model_id)
# model = AutoModelForQuestionAnswering.from_pretrained(model_id)
model = ORTModelForQuestionAnswering.from_pretrained("roberta_base_qa_onnx")
qa_pipe = pipeline("question-answering", model=model, tokenizer=tokenizer)
question = "What's Optimum?"
context = "Optimum is an awesome library everyone should use!"
results = qa_pipe(question=question, context=context)

But as the pix2struct tasks are:['image-to-text', 'image-to-text-with-past', 'visual-question-answering', 'visual-question-answering-with-past'] I think that the model should be load with something like: from optimum.onnxruntime import ORTModelForVisualQuestionAnswering, but I didn't find the correct configuration yet.

And if I try to load with model = ORTModelForQuestionAnswering.from_pretrained("pix2struct-docvqa-base_onnx"), it gives me the next output: RuntimeError: Too many ONNX model files were found in pix2struct-docvqa-base_onnx, specify which one to load by using the file_name argument.

image

Could you give me an example to run the pix2struct onnx model for a test image?

fxmarty commented 1 year ago

Unfortunately there is no ORTModel for pix2struct, we should have probably an ORTModelForVisualQuestionAnswering. My PR only implemented the ONNX export.

fxmarty commented 1 year ago

Related: https://github.com/huggingface/optimum/issues/1113

arvisioncode commented 1 year ago

Okay, and is there no other way right now to make the inference from these obtained models? to check its performance

fxmarty commented 1 year ago

You could analyze what the graph looks like with netron, and run the models in PyTorch/ORT with dummy inputs just to see the latency/throughput you get.

You will have three models to run: the encoder, the decoder without past, and the decoder with past key values.

arvisioncode commented 1 year ago

I've been trying but I don't really know how to do it, could you make an inference as an example?

fxmarty commented 1 year ago

I will improve the documentation about the usage of merged decoders, yes!

shephinphilip commented 1 year ago

@arvisioncode and @fxmarty currently i am having some trouble with converting Pix2Struct-docvqa-base Model to ONNX . Would you please help me on this .

Code : import torch from transformers.onnx import FeaturesManager from transformers import AutoConfig, AutoTokenizer, AutoModelForSequenceClassification from transformers import Pix2StructForConditionalGeneration, Pix2StructProcessor import numpy as np from torch.cuda.amp import autocast

Load the Hugging Face model and tokenizer for Pix2Struct

model_id = "google/pix2struct-docvqa-base" feature = "sequence-classification" model = Pix2StructForConditionalGeneration.from_pretrained(model_id) tokenizer = AutoTokenizer.from_pretrained(model_id)

load config

model_kind, model_onnx_config = FeaturesManager.check_supported_model_or_raise(model, feature=feature) onnx_config = model_onnx_config(model.config)

Define example question and document text

question = "What is the main topic of the document?" document = "This is an example document about Pix2Struct."

Modify the batch size

batch_size = 4 # Change this to your desired batch size inputs = tokenizer(question, document, return_tensors="pt", padding="max_length", truncation=True, max_length=128) input_ids = inputs["input_ids"] attention_mask = inputs["attention_mask"]

Adjust batch size dynamically

input_ids = input_ids[:batch_size] attention_mask = attention_mask[:batch_size]

with torch.no_grad(), autocast(): start_logits, end_logits = model(input_ids, attention_mask=attention_mask)

Convert to ONNX

onnx_path = "model/pix2struct-docvqa-base.onnx" torch.onnx.export( model, (input_ids, attention_mask), onnx_path, input_names=["input_ids", "attention_mask"], output_names=["start_logits", "end_logits"], dynamic_axes={ "input_ids": {0: "batch_size", 1: "seq_length"}, "attention_mask": {0: "batch_size", 1: "seq_length"}, "start_logits": {0: "batch_size", 1: "seq_length"}, "end_logits": {0: "batch_size", 1: "seq_length"}, }, opset_version=13 )

print("Model converted to ONNX successfully!")

Error :

KeyError: "pix2struct is not supported yet. Only ['albert', 'bart', 'beit', 'bert', 'big-bird', 'bigbird-pegasus', 'blenderbot', 'blenderbot-small', 'bloom', 'camembert', 'clip', 'codegen', 'convbert', 'convnext', 'data2vec-text', 'data2vec-vision', 'deberta', 'deberta-v2', 'deit', 'detr', 'distilbert', 'electra', 'flaubert', 'gpt2', 'gptj', 'gpt-neo', 'groupvit', 'ibert', 'imagegpt', 'layoutlm', 'layoutlmv3', 'levit', 'longt5', 'longformer', 'marian', 'mbart', 'mobilebert', 'mobilenet-v1', 'mobilenet-v2', 'mobilevit', 'mt5', 'm2m-100', 'owlvit', 'perceiver', 'poolformer', 'rembert', 'resnet', 'roberta', 'roformer', 'segformer', 'squeezebert', 'swin', 't5', 'vision-encoder-decoder', 'vit', 'whisper', 'xlm', 'xlm-roberta', 'yolos'] are supported. If you want to support pix2struct please propose a PR or open up an issue."

DracoCoder commented 1 year ago

I've been trying but I don't really know how to do it, could you make an inference as an example?

Hey Did you figure it out?

DracoCoder commented 1 year ago

I will improve the documentation about the usage of merged decoders, yes!

Is there any progress on this?

rish-hyun commented 1 year ago
from pathlib import Path
from optimum.exporters import TasksManager
from optimum.exporters.onnx import export
from transformers import Pix2StructForConditionalGeneration

base_model = Pix2StructForConditionalGeneration.from_pretrained(HF_MODEL_NAME, cache_dir=MODELS_DIR).to(DEVICE)

onnx_path = Path("model.onnx")
onnx_config_constructor = TasksManager.get_exporter_config_constructor("onnx", base_model, task='visual-question-answering')
onnx_config = onnx_config_constructor(base_model.config)

onnx_inputs, onnx_outputs = export(base_model, onnx_config, onnx_path, onnx_config.DEFAULT_ONNX_OPSET)
Using framework PyTorch: 2.0.1+cu118
/usr/local/lib/python3.10/dist-packages/transformers/models/pix2struct/modeling_pix2struct.py:221: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
  scores = torch.max(scores, torch.tensor(torch.finfo(scores.dtype).min))
/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py:847: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if causal_mask.shape[1] < attention_mask.shape[1]:
============= Diagnostic Run torch.onnx.export version 2.0.1+cu118 =============
verbose: False, log level: Level.ERROR
======================= 0 NONE 0 NOTE 0 WARNING 0 ERROR ========================

This code created a single model - model.onnx. Now, how can I run inference?

Whereas this command line done bunch of things and create multiple files in model directory

!optimum-cli export onnx --model google/pix2struct-docvqa-base models/pix2struct-docvqa-base_onnx/
DracoCoder commented 1 year ago

This code created a single model - model.onnx. Now, how can I run inference?

https://www.kaggle.com/gauravcodes/pix2struct-transformers-vs-onnx-comparison I think figured it out, Check this out.

rish-hyun commented 1 year ago

This code created a single model - model.onnx. Now, how can I run inference?

https://www.kaggle.com/gauravcodes/pix2struct-transformers-vs-onnx-comparison I think figured it out, Check this out.

Thanks for this! I tried the same, but it didn't worked. Seems like it predicts token from question(given text) not from image.

I guess this is because the input image should already include the question embedded as header-text.

I tried to give a question which does not include answer but the output it gave was using tokens from the question.

(I might be wrong, please correct me if so. This is what I have understood so far)

askgh69 commented 1 year ago

Yea I agree.

rish-hyun commented 1 year ago

You could analyze what the graph looks like with netron, and run the models in PyTorch/ORT with dummy inputs just to see the latency/throughput you get.

You will have three models to run: the encoder, the decoder without past, and the decoder with past key values.

Thanks for this! Using the encoder, decoder and decoder with past yield exact results. I've quantized as well and here are the results -> results.md

fxmarty commented 1 year ago

That's a very interesting benchmark @rish-hyun, thank you for doing it! The merged model would be useful only to decrease memory usage, not runtime. Currently though it is not straightforward to quantize a merged model (as it has subgraphs), so if you use quantization it is tricky.

Pix2struct should now be usable natively with ONNX Runtime in Optimum with the class ORTModelForPix2Struct: https://huggingface.co/docs/optimum/main/en/onnxruntime/package_reference/modeling_ort#optimum.onnxruntime.ORTModelForPix2Struct

Fixed by https://github.com/huggingface/optimum/pull/1296