huggingface / optimum

🚀 Accelerate training and inference of 🤗 Transformers and 🤗 Diffusers with easy to use hardware optimization tools
https://huggingface.co/docs/optimum/main/
Apache License 2.0
2.5k stars 447 forks source link

Clip Exporter Issue #882

Closed zeyaddeeb closed 1 year ago

zeyaddeeb commented 1 year ago

System Info

python: `3.11`
optimum version: `1.7.1

Who can help?

@JingyaHuang, @echarlaix

Information

Tasks

Reproduction

Command:

optimum-cli export onnx --task zero-shot-object-detection --optimize O2 --model openai/clip-vit-base-patch32 .

Issue:

'Could not find the proper task name for AutoModelForZeroShotImageClassification.'"

Expected behavior

Export to be a success

xenova commented 1 year ago

I'm also having this issue. Also, when exporting with default task, I get the following warning:

[W:onnxruntime:, execution_frame.cc:835 VerifyOutputSizes] Expected shape from model of {-1,2} does not match actual shape of {1,3} for output logits_per_image
fxmarty commented 1 year ago

Sorry for the issue. optimum-cli export onnx --model openai/clip-vit-base-patch32 --task default clip_onnx/ works fine to me on optimum 1.7.1.

The error message raised is not very informative, the task passed should be one of https://github.com/huggingface/optimum/blob/48967caae9b8cca2b132ced7387f45ee9458665a/optimum/exporters/tasks.py#L283-L285. I'll improve this.

Edit: that said the test for CLIP passing no task started failing in our CI, will have a look at why.

xenova commented 1 year ago

Sorry for the issue. optimum-cli export onnx --model openai/clip-vit-base-patch32 --task default clip_onnx/ works fine to me on optimum 1.7.1.

Whoops- I meant to say when running the default model (it exports fine). I'm running it in ONNX Runtime web, so, I don't have a very simple way to show how to reproduce unfortunately 😅 (but hopefully it's simple to account for other output dimensions)

fxmarty commented 1 year ago

@xenova Are you sure you are passing good pixel_values shapes? In pure pytorch it appears the CLIP preprocessor pixel_values crops to 224 x 224.

This works fine to me:

optimum-cli export onnx --model openai/clip-vit-base-patch32 --task default clip_onnx/

and

import onnxruntime as ort
import numpy as np

onnx_path = "/path/to/clip_onnx/model.onnx"
session = ort.InferenceSession(onnx_path, providers=["CPUExecutionProvider"])

batch_size = 2
sequence_length = 50
num_channels = 3
height = 224
width = 224
inputs = {
    "input_ids": np.random.randint(0, high=100, size=(batch_size, sequence_length), dtype=np.int64),
    "pixel_values": np.random.uniform(low=-1, high=1, size=(batch_size, num_channels, height, width)).astype(np.float32),
    "attention_mask": np.random.randint(0, high=2, size=(batch_size, sequence_length), dtype=np.int64)
}

res = session.run(None, inputs)
xenova commented 1 year ago

Yes, I do believe I am using the correct dimensions: image

Using your script above, though, I found the issue (using different batch sizes for images and text), which does seem to be a bug.


import onnxruntime as ort
import numpy as np

path = './models/onnx/quantized/openai/clip-vit-base-patch16/default/model.onnx'
session = ort.InferenceSession(path, providers=["CPUExecutionProvider"])

text_batch_size = 3
sequence_length = 50

img_batch_size = 2
num_channels = 3
height = 224
width = 224

inputs = {
    "input_ids": np.random.randint(0, high=100, size=(text_batch_size, sequence_length), dtype=np.int64),
    "attention_mask": np.random.randint(0, high=2, size=(text_batch_size, sequence_length), dtype=np.int64),
    "pixel_values": np.random.uniform(low=-1, high=1, size=(img_batch_size, num_channels, height, width)).astype(np.float32),
}

res = session.run(None, inputs)

print(f'{res=}')

which gives the warning:

2023-03-15 16:34:32.2142309 [W:onnxruntime:, execution_frame.cc:835 onnxruntime::ExecutionFrame::VerifyOutputSizes] Expected shape from model of {-1,2} does not match actual shape of {2,3} for output logits_per_image

Note: The clip model supports different batch sizes for each modality (pixel and text), since in the end, it essentially does a cartesian product comparison.

fxmarty commented 1 year ago

Oh right, yes there is a bug then in the export. Will fix.

fxmarty commented 1 year ago

Hi @xenova , this should be fixed in #884. Could you give it a try with latest transformers release (4.27.0) and optimum main? Your sample script now works. The issue was axis sharing the same name although they may have different shape.

We'll do a release soon as PyTorch 2.0 broke a few things here and there.