huggingface / optimum

🚀 Accelerate training and inference of 🤗 Transformers and 🤗 Diffusers with easy to use hardware optimization tools
https://huggingface.co/docs/optimum/main/
Apache License 2.0
2.58k stars 470 forks source link

Implement ORTModelForZeroShotObjectDetection #1721

Open solomonmanuelraj opened 8 months ago

solomonmanuelraj commented 8 months ago

Feature request

Hi team,

ORT quantization tools are not available for zero shot object detection methods like owl-vit (google/owlvit-base-patch32). Creating the ORT quantization tools for zero shot object detection methods like owl-vit will help to export the quantized / finetuned owl-vit ( zero shot object detection models ) into onnx format.

Motivation

Zero shot object detection models (e.g. owl-vit ) to export to onnx format

Your contribution

I can test and validate the results.

fxmarty commented 8 months ago

@solomonmanuelraj Could you explain what would you like to be supported?

optimum-cli export onnx -m google/owlvit-base-patch32 owlvit_onnx

& e.g.

optimum-cli onnxruntime quantize --onnx_model owlvit_onnx --output owlvit_onnx_quantized --avx512

should work. This uses dynamic quantization. The quality of the quantized model is not guaranteed though, you would need to evaluate that. For more custom usage, you would need to use the python API: https://huggingface.co/docs/optimum/main/en/onnxruntime/usage_guides/quantization & https://huggingface.co/docs/optimum/main/en/onnxruntime/package_reference/quantization & https://huggingface.co/docs/optimum/main/en/onnxruntime/package_reference/configuration#optimum.onnxruntime.QuantizationConfig

solomonmanuelraj commented 8 months ago

@fxmarty , thanks for your quick response. yes i used your above code for the dynamic quantization on the owl-vit model.

when i refer (https://www.philschmid.de/optimizing-transformers-with-optimum - 5. Test inference with the quantized model) web page they use specific classes like (e.g. ORTModelForSequenceClassification ) to load the optimized model for inference purpose.

For zero shot object detection task we do not have ORTModelForZeroShotObjectDetection class which can be used to load the quantized onnx model for the inference purpose.

To evaluate the performance and speed evaluators are available for text_classification tasks, ########################################################################################## rom evaluate import evaluator from datasets import load_dataset

eval = evaluator("text-classification") eval_dataset = load_dataset("banking77", split="test")

results = eval.compute( model_or_pipeline=q8_clf, data=eval_dataset, metric="accuracy", input_column="text", label_column="label", label_mapping=model.config.label2id, strategy="simple", ) print(results) #####################################################################################

like to know for zeroshotobject detection task similar evaluator is available or not?

your reference will be useful

thanks

fxmarty commented 8 months ago

Hi @solomonmanuelraj,

First, investigating this issue I found out there was an issue in the ONNX export of owlvit due to the usage of numpy in the modeling code and fixed in https://github.com/huggingface/transformers/pull/29326. Please install Transformers from source as this is not yet in a release.

Then, here is an example of usage: optimum-cli export onnx --model google/owlvit-base-patch32 --task zero-shot-object-detection owlvit_onnx

and

import requests
from PIL import Image
import torch
from optimum.onnxruntime import ORTModelForCustomTasks
from transformers import AutoProcessor

processor = AutoProcessor.from_pretrained("google/owlvit-base-patch32")

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)
texts = [["a photo of a cat", "a photo of a dog", "me", "hey"]]
inputs = processor(text=texts, images=image, return_tensors="pt")

model = ORTModelForCustomTasks.from_pretrained("/path/to/owlvit_onnx")

inputs = processor(text=texts, images=image, return_tensors="pt")

outputs = model(**inputs)

# Target image sizes (height, width) to rescale box predictions [batch_size, 2]
target_sizes = torch.Tensor([image.size[::-1]])
# Convert outputs (bounding boxes and class logits) to final bounding boxes and scores
results = processor.post_process_object_detection(
    outputs=outputs, threshold=0.1, target_sizes=target_sizes
)

using https://huggingface.co/docs/optimum/main/en/onnxruntime/package_reference/modeling_ort#optimum.onnxruntime.ORTModelForCustomTasks that is able to handle ONNX models with arbitrary inputs/outputs.