Closed arvisioncode closed 1 year ago
Can you run pip uninstall transformers && pip install transformers
? This fix https://github.com/huggingface/transformers/pull/23932 and thus transformers>=4.30 is required for pix2struct export.
I did it and this is the output now:
NotImplementedError: ONNX Runtime doesn't support the graph optimization of pix2struct yet. Only ['albert', 'bart', 'bert', 'big_bird', 'blenderbot', 'bloom', 'camembert', 'codegen', 'deberta', 'deberta-v2', 'distilbert', 'electra', 'gpt2', 'gpt_neo', 'gpt_neox', 'gptj', 'longt5', 'llama', 'marian', 'mbart', 'mt5', 'm2m_100', 'nystromformer', 'pegasus', 'roberta', 't5', 'whisper', 'xlm-roberta'] are supported. If you want to support pix2struct please propose a PR or open up an issue in ONNX Runtime:https://github.com/microsoft/onnxruntime.
Maybe my optimum installation is not taking the last changes? But I think that these commands install from the master branch
!python -m pip install git+https://github.com/huggingface/optimum.git#egg=optimum[onnxruntime]
or
!python -m pip install git+https://github.com/huggingface/optimum.git#egg=optimum[exporters,onnxruntime]
If you remove --optimize O2
it should work!
thank you!! it is working now yes!! :)
just another question... do you have an example for the inference of the model?
I see the code of the example:
from optimum.onnxruntime import ORTModelForQuestionAnswering
from transformers import AutoTokenizer, pipeline
model_id = "deepset/roberta-base-squad2"
tokenizer = AutoTokenizer.from_pretrained(model_id)
# model = AutoModelForQuestionAnswering.from_pretrained(model_id)
model = ORTModelForQuestionAnswering.from_pretrained("roberta_base_qa_onnx")
qa_pipe = pipeline("question-answering", model=model, tokenizer=tokenizer)
question = "What's Optimum?"
context = "Optimum is an awesome library everyone should use!"
results = qa_pipe(question=question, context=context)
But as the pix2struct tasks are:['image-to-text', 'image-to-text-with-past', 'visual-question-answering', 'visual-question-answering-with-past']
I think that the model should be load with something like: from optimum.onnxruntime import ORTModelForVisualQuestionAnswering
, but I didn't find the correct configuration yet.
And if I try to load with model = ORTModelForQuestionAnswering.from_pretrained("pix2struct-docvqa-base_onnx")
, it gives me the next output: RuntimeError: Too many ONNX model files were found in pix2struct-docvqa-base_onnx, specify which one to load by using the file_name argument.
Could you give me an example to run the pix2struct onnx model for a test image?
Unfortunately there is no ORTModel for pix2struct, we should have probably an ORTModelForVisualQuestionAnswering
. My PR only implemented the ONNX export.
Okay, and is there no other way right now to make the inference from these obtained models? to check its performance
You could analyze what the graph looks like with netron
, and run the models in PyTorch/ORT with dummy inputs just to see the latency/throughput you get.
You will have three models to run: the encoder, the decoder without past, and the decoder with past key values.
I've been trying but I don't really know how to do it, could you make an inference as an example?
I will improve the documentation about the usage of merged decoders, yes!
@arvisioncode and @fxmarty currently i am having some trouble with converting Pix2Struct-docvqa-base Model to ONNX . Would you please help me on this .
Code : import torch from transformers.onnx import FeaturesManager from transformers import AutoConfig, AutoTokenizer, AutoModelForSequenceClassification from transformers import Pix2StructForConditionalGeneration, Pix2StructProcessor import numpy as np from torch.cuda.amp import autocast
model_id = "google/pix2struct-docvqa-base" feature = "sequence-classification" model = Pix2StructForConditionalGeneration.from_pretrained(model_id) tokenizer = AutoTokenizer.from_pretrained(model_id)
model_kind, model_onnx_config = FeaturesManager.check_supported_model_or_raise(model, feature=feature) onnx_config = model_onnx_config(model.config)
question = "What is the main topic of the document?" document = "This is an example document about Pix2Struct."
batch_size = 4 # Change this to your desired batch size inputs = tokenizer(question, document, return_tensors="pt", padding="max_length", truncation=True, max_length=128) input_ids = inputs["input_ids"] attention_mask = inputs["attention_mask"]
input_ids = input_ids[:batch_size] attention_mask = attention_mask[:batch_size]
with torch.no_grad(), autocast(): start_logits, end_logits = model(input_ids, attention_mask=attention_mask)
onnx_path = "model/pix2struct-docvqa-base.onnx" torch.onnx.export( model, (input_ids, attention_mask), onnx_path, input_names=["input_ids", "attention_mask"], output_names=["start_logits", "end_logits"], dynamic_axes={ "input_ids": {0: "batch_size", 1: "seq_length"}, "attention_mask": {0: "batch_size", 1: "seq_length"}, "start_logits": {0: "batch_size", 1: "seq_length"}, "end_logits": {0: "batch_size", 1: "seq_length"}, }, opset_version=13 )
print("Model converted to ONNX successfully!")
KeyError: "pix2struct is not supported yet. Only ['albert', 'bart', 'beit', 'bert', 'big-bird', 'bigbird-pegasus', 'blenderbot', 'blenderbot-small', 'bloom', 'camembert', 'clip', 'codegen', 'convbert', 'convnext', 'data2vec-text', 'data2vec-vision', 'deberta', 'deberta-v2', 'deit', 'detr', 'distilbert', 'electra', 'flaubert', 'gpt2', 'gptj', 'gpt-neo', 'groupvit', 'ibert', 'imagegpt', 'layoutlm', 'layoutlmv3', 'levit', 'longt5', 'longformer', 'marian', 'mbart', 'mobilebert', 'mobilenet-v1', 'mobilenet-v2', 'mobilevit', 'mt5', 'm2m-100', 'owlvit', 'perceiver', 'poolformer', 'rembert', 'resnet', 'roberta', 'roformer', 'segformer', 'squeezebert', 'swin', 't5', 'vision-encoder-decoder', 'vit', 'whisper', 'xlm', 'xlm-roberta', 'yolos'] are supported. If you want to support pix2struct please propose a PR or open up an issue."
I've been trying but I don't really know how to do it, could you make an inference as an example?
Hey Did you figure it out?
I will improve the documentation about the usage of merged decoders, yes!
Is there any progress on this?
from pathlib import Path
from optimum.exporters import TasksManager
from optimum.exporters.onnx import export
from transformers import Pix2StructForConditionalGeneration
base_model = Pix2StructForConditionalGeneration.from_pretrained(HF_MODEL_NAME, cache_dir=MODELS_DIR).to(DEVICE)
onnx_path = Path("model.onnx")
onnx_config_constructor = TasksManager.get_exporter_config_constructor("onnx", base_model, task='visual-question-answering')
onnx_config = onnx_config_constructor(base_model.config)
onnx_inputs, onnx_outputs = export(base_model, onnx_config, onnx_path, onnx_config.DEFAULT_ONNX_OPSET)
Using framework PyTorch: 2.0.1+cu118
/usr/local/lib/python3.10/dist-packages/transformers/models/pix2struct/modeling_pix2struct.py:221: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
scores = torch.max(scores, torch.tensor(torch.finfo(scores.dtype).min))
/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py:847: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if causal_mask.shape[1] < attention_mask.shape[1]:
============= Diagnostic Run torch.onnx.export version 2.0.1+cu118 =============
verbose: False, log level: Level.ERROR
======================= 0 NONE 0 NOTE 0 WARNING 0 ERROR ========================
This code created a single model - model.onnx
. Now, how can I run inference?
Whereas this command line done bunch of things and create multiple files in model directory
!optimum-cli export onnx --model google/pix2struct-docvqa-base models/pix2struct-docvqa-base_onnx/
This code created a single model -
model.onnx
. Now, how can I run inference?
https://www.kaggle.com/gauravcodes/pix2struct-transformers-vs-onnx-comparison I think figured it out, Check this out.
This code created a single model -
model.onnx
. Now, how can I run inference?https://www.kaggle.com/gauravcodes/pix2struct-transformers-vs-onnx-comparison I think figured it out, Check this out.
Thanks for this! I tried the same, but it didn't worked. Seems like it predicts token from question(given text) not from image.
I guess this is because the input image should already include the question embedded as header-text.
I tried to give a question which does not include answer but the output it gave was using tokens from the question.
(I might be wrong, please correct me if so. This is what I have understood so far)
Yea I agree.
You could analyze what the graph looks like with
netron
, and run the models in PyTorch/ORT with dummy inputs just to see the latency/throughput you get.You will have three models to run: the encoder, the decoder without past, and the decoder with past key values.
Thanks for this! Using the encoder, decoder and decoder with past yield exact results. I've quantized as well and here are the results -> results.md
That's a very interesting benchmark @rish-hyun, thank you for doing it! The merged model would be useful only to decrease memory usage, not runtime. Currently though it is not straightforward to quantize a merged model (as it has subgraphs), so if you use quantization it is tricky.
Pix2struct should now be usable natively with ONNX Runtime in Optimum with the class ORTModelForPix2Struct: https://huggingface.co/docs/optimum/main/en/onnxruntime/package_reference/modeling_ort#optimum.onnxruntime.ORTModelForPix2Struct
System Info
Who can help?
Hi @fxmarty ! First of all, thank you for your work on implementing the pix2struct conversion here. I'm trying to run the conversion following the commands of the main page, but I am having some problems...
I have done the installation of optimum from the repositories as explained before, and to run the transformation I have try the following commands:
...
These are the execution logs:
Is this the correct way to make the conversion to ONNX? Can you help me with this problem?
Thank you so much in advance! :)
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Expected behavior
Model in ONNX