Closed darwinharianto closed 1 year ago
I see here that there was an issue with the export of aten:broadcast_to
: https://github.com/huggingface/optimum/blob/8d97c6806d8b1bb97625f2387f747822b5fee68e/optimum/exporters/onnx/model_configs.py#L772
But it works with transformers.onnx
, not sure what the difference is here in Optimum. Any idea @michaelbenayoun @fxmarty ?
Sorry, but one more thing.. How can I extend this so I can convert not only OwlViTModel, but OwlViTForObjectDetection too?
I tried to convert OwlViTForObjectDetection using the unmaintained transformers.onnx module it seems like it works for OwlVitModel, but it doesnt work for OwlViTForObjectDetection.
It throws this error Exporting the operator 'aten::broadcast_to' to ONNX opset version 14 is not supported.
Shoud I just wait until pytorch support this?
Sorry, but one more thing.. How can I extend this so I can convert not only OwlViTModel, but OwlViTForObjectDetection too?
Here is the guide to add support for new architectures: https://huggingface.co/docs/optimum/exporters/onnx/usage_guides/contribute So, if you want to add support for a new task, you'll need to register this task here.
I tried to convert OwlViTForObjectDetection using the unmaintained transformers.onnx module it seems like it works for OwlVitModel, but it doesnt work for OwlViTForObjectDetection.
It throws this error
Exporting the operator 'aten::broadcast_to' to ONNX opset version 14 is not supported.
Shoud I just wait until pytorch support this?
Okay, so I guess that was the error I mentioned in my first message. If you want to add support for this task, two possible ways:
torch.broadcast_to
here and there. It is probably possible to find another way of doing the same thing that is compatible with ONNX export.@regisss using pytorch nightly, now it can support the broadcast_to operation, I have some question on this task thing I see that we can specify task in onnxconfig, such as
CLIPOnnxConfig(config, task="zero-shot-image-classification")
# or
CLIPOnnxConfig(config, task="feature-extraction")
but both this settings output the same results, is it intended?
logits_per_image, image_embeds, text_embeds, logits_per_text
Regarding the owlvit support, I made a pull request on https://github.com/huggingface/optimum/pull/1067, but for now I am using torch nightly
Good news that it is now supported in PyTorch! That way, you can work on your PR and we will merge it when the next release of PyTorch is out :slightly_smiling_face:
Regarding your question, it is indeed expected since in Transformers these two tasks are mapped to CLIPModel
(see here for feature extraction and there for zero-shot classification). Logits should enable to perform zero-shot classification and embeddings to perform feature extraction.
Hello there! Is zero-shot object detection supported by this PR? I've been trying to convert the OwlViT model (for object detection) to ONNX without success. I see there is no ORTModelFor___ for object detection. I have also tried converting using transformers.onnx without success. Any tips? Thanks in advance!
Hi @Pedrohgv , yes zero-shot object detection should be supported for owlvit. For example optimum-cli export onnx --model google/owlvit-base-patch32 --task zero-shot-object-detection owlvit_onnx
should work.
@fxmarty Thank you for the reply. I successfully converted the model, but couldn't get it to run a sample. My code:
checkpoint = "google/owlvit-base-patch32"
processor = AutoProcessor.from_pretrained(checkpoint)
np_inputs = processor(text=text_queries, images=image, return_tensors="np")
session = ort.InferenceSession(PROJECT_FOLDER + "owlvit_onnx/model.onnx")
out =session.run(['logits', 'pred_boxes', 'text_embeds', 'image_embeds'], np_inputs)
This is throwing the error:
RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Non-zero status code returned while running Reshape node. Name:'/Reshape_3' Status Message: /Users/runner/work/1/s/onnxruntime/core/providers/cpu/tensor/reshape_helper.h:41 onnxruntime::ReshapeHelper::ReshapeHelper(const onnxruntime::TensorShape &, onnxruntime::TensorShapeVector &, bool) gsl::narrow_cast(input_shape.Size()) == size was false. The input tensor cannot be reshaped to the requested shape. Input shape:{9,16}, requested shape:{2,4,16}
Now it seems to be related to some input being wrong, but I cannot get what is wrong. The pre-processing step is the same as for the HF model, only difference being instead of returning "pt" tensors I'm returning "np" so it can work with ONNX. Here are my input shapes:
input_ids: (9, 16) attention_mask: (9, 16) pixel_values: (1, 3, 768, 768)
Thanks in advance!
@Pedrohgv could you open an issue with a reproducible export + code?
Feature request
The conversion is supported in transfomers[onnx], but not yet supported in optimum.
Motivation
convert open world vocabulary to onnx model for faster inference.
Your contribution
If there is a guideline on how to do it, I think I can help