Closed zeyaddeeb closed 1 year ago
I'm also having this issue. Also, when exporting with default task, I get the following warning:
[W:onnxruntime:, execution_frame.cc:835 VerifyOutputSizes] Expected shape from model of {-1,2} does not match actual shape of {1,3} for output logits_per_image
Sorry for the issue. optimum-cli export onnx --model openai/clip-vit-base-patch32 --task default clip_onnx/
works fine to me on optimum 1.7.1.
The error message raised is not very informative, the task passed should be one of https://github.com/huggingface/optimum/blob/48967caae9b8cca2b132ced7387f45ee9458665a/optimum/exporters/tasks.py#L283-L285. I'll improve this.
Edit: that said the test for CLIP passing no task started failing in our CI, will have a look at why.
Sorry for the issue.
optimum-cli export onnx --model openai/clip-vit-base-patch32 --task default clip_onnx/
works fine to me on optimum 1.7.1.
Whoops- I meant to say when running the default model (it exports fine). I'm running it in ONNX Runtime web, so, I don't have a very simple way to show how to reproduce unfortunately 😅 (but hopefully it's simple to account for other output dimensions)
@xenova Are you sure you are passing good pixel_values
shapes? In pure pytorch it appears the CLIP preprocessor pixel_values crops to 224 x 224.
This works fine to me:
optimum-cli export onnx --model openai/clip-vit-base-patch32 --task default clip_onnx/
and
import onnxruntime as ort
import numpy as np
onnx_path = "/path/to/clip_onnx/model.onnx"
session = ort.InferenceSession(onnx_path, providers=["CPUExecutionProvider"])
batch_size = 2
sequence_length = 50
num_channels = 3
height = 224
width = 224
inputs = {
"input_ids": np.random.randint(0, high=100, size=(batch_size, sequence_length), dtype=np.int64),
"pixel_values": np.random.uniform(low=-1, high=1, size=(batch_size, num_channels, height, width)).astype(np.float32),
"attention_mask": np.random.randint(0, high=2, size=(batch_size, sequence_length), dtype=np.int64)
}
res = session.run(None, inputs)
Yes, I do believe I am using the correct dimensions:
Using your script above, though, I found the issue (using different batch sizes for images and text), which does seem to be a bug.
import onnxruntime as ort
import numpy as np
path = './models/onnx/quantized/openai/clip-vit-base-patch16/default/model.onnx'
session = ort.InferenceSession(path, providers=["CPUExecutionProvider"])
text_batch_size = 3
sequence_length = 50
img_batch_size = 2
num_channels = 3
height = 224
width = 224
inputs = {
"input_ids": np.random.randint(0, high=100, size=(text_batch_size, sequence_length), dtype=np.int64),
"attention_mask": np.random.randint(0, high=2, size=(text_batch_size, sequence_length), dtype=np.int64),
"pixel_values": np.random.uniform(low=-1, high=1, size=(img_batch_size, num_channels, height, width)).astype(np.float32),
}
res = session.run(None, inputs)
print(f'{res=}')
which gives the warning:
2023-03-15 16:34:32.2142309 [W:onnxruntime:, execution_frame.cc:835 onnxruntime::ExecutionFrame::VerifyOutputSizes] Expected shape from model of {-1,2} does not match actual shape of {2,3} for output logits_per_image
Note: The clip model supports different batch sizes for each modality (pixel and text), since in the end, it essentially does a cartesian product comparison.
Oh right, yes there is a bug then in the export. Will fix.
Hi @xenova , this should be fixed in #884. Could you give it a try with latest transformers release (4.27.0) and optimum main? Your sample script now works. The issue was axis sharing the same name although they may have different shape.
We'll do a release soon as PyTorch 2.0 broke a few things here and there.
System Info
Who can help?
@JingyaHuang, @echarlaix
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Command:
Issue:
Expected behavior
Export to be a success