Open susht3 opened 1 year ago
cc @michaelbenayoun
Hi @susht3 ,
You mean that you want to export a CLIPTextModel
and CLIPVisionModel
?
We support the CLIP export in optimum
:
optimum-cli export onnx -m openai/clip-vit-base-patch32 --task default clip
But as I understand here, you want to export two models?
Hi @susht3 , You mean that you want to export a
CLIPTextModel
andCLIPVisionModel
?We support the CLIP export in
optimum
:optimum-cli export onnx -m openai/clip-vit-base-patch32 --task default clip
But as I understand here, you want to export two models?
yes,i try to convert by transformer.onnx but failed, my code like this:
model = CLIPModel.from_pretrained(model_path) processor = CLIPProcessor.from_pretrained(model_path) text = processor.tokenizer("[UNK]”, return_tensors="np") image = processor.feature_extractor(Image.open("CLIP.png")) text_model = model.text_model image_model = model.vision_model onnx_inputs, onnx_outputs = export( preprocessor=tokenizer, model=text_model, config=onnx_config, opset=10, output=onnx_model_path )
You want what kind of inputs?
Anyways, you should use optimum.exporters.onnx
for this.
You should be able to export the text model easily because we have a CLIPTextOnnxConfig
.
For the rest we have CLIPOnnxConfig
as well.
thanks,and which is clip visual onxx config? i can't find it
I think we do not have it, but you can make a PR and add it if you are interested!
with torch.no_grad(): image_features = model.encode_image(image)
torch.onnx.export(model.visual,
image,
"image_encoder.onnx",
input_names=("images", ),
output_names=("image_features", ),
dynamic_axes={"images": {
0: "num_image"
}})
# text_features = model.encode_text(text)
text_features = model(text)
torch.onnx.export(model, (text, ),
"text_encoder.onnx",
input_names=("texts", ),
output_names=("text_features", ),
dynamic_axes={"texts": {
0: "num_text"
}})
Coding like this, then you can get the image encoder and text encoder onnx model respectively
You want what kind of inputs?
Anyways, you should use
optimum.exporters.onnx
for this. You should be able to export the text model easily because we have aCLIPTextOnnxConfig
.For the rest we have
CLIPOnnxConfig
as well.
Hi, could you please show more hints about how to specifically export clip-text-model using optimum.exporters.onnx?
Maybe @mht-sharma ?
I tried with this code but it doesnt work
from PIL import Image
import requests
import torch
from transformers import CLIPProcessor, CLIPModel
import optimum.exporters.onnx
model = CLIPModel.from_pretrained("wkcn/TinyCLIP-ViT-8M-16-Text-3M-YFCC15M")
processor = CLIPProcessor.from_pretrained("wkcn/TinyCLIP-ViT-8M-16-Text-3M-YFCC15M")
url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)
text=["a photo of a cat", "a photo of a dog"]
inputs = processor(text, images=image, return_tensors="pt", padding=True)
vision_arch = "tinyVit"
image = inputs.data["pixel_values"]
visualEmbedding = model.vision_model
visualEmbedding.eval()
torch.onnx.export(visualEmbedding, # Model being run
image, # Model input
f"models/{vision_arch}.onnx", # Output model location
input_names=['modelInput'], # Input name
output_names=['modelOutput'] # Output name
)
print(f"Model saved as {vision_arch}.onnx")
i get the following error
z_(): incompatible function arguments. The following argument types are supported:
1. (self: torch._C.Node, arg0: str, arg1: torch.Tensor) -> torch._C.Node
Invoked with: %258 : Tensor = onnx::Constant(), scope: transformers.models.clip.modeling_clip.CLIPVisionTransformer::/transformers.models.clip.modeling_clip.CLIPEncoder::encoder/transformers.models.clip.modeling_clip.CLIPEncoderLayer::layers.0/transformers.models.clip.modeling_clip.CLIPSdpaAttention::self_attn
, 'value', 0.125
(Occurred when translating scaled_dot_product_attention).
Can someone explain how to use the optimum.exporters.onnx
to export the vision transformer and the text transformer seperatly.
Model description
i want to export clip to text encoder and image encoder two onnx, but it seems can only convert the whole model, how can i seperate clip to two onnx models?
Open source status
Provide useful links for the implementation
No response