huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
132.92k stars 26.52k forks source link

export clip to text encoder and image encoder two onnxs #22221

Open susht3 opened 1 year ago

susht3 commented 1 year ago

Model description

i want to export clip to text encoder and image encoder two onnx, but it seems can only convert the whole model, how can i seperate clip to two onnx models?

Open source status

Provide useful links for the implementation

No response

amyeroberts commented 1 year ago

cc @michaelbenayoun

michaelbenayoun commented 1 year ago

Hi @susht3 , You mean that you want to export a CLIPTextModel and CLIPVisionModel?

We support the CLIP export in optimum:

optimum-cli export onnx -m openai/clip-vit-base-patch32 --task default clip

But as I understand here, you want to export two models?

susht3 commented 1 year ago

Hi @susht3 , You mean that you want to export a CLIPTextModel and CLIPVisionModel?

We support the CLIP export in optimum:

optimum-cli export onnx -m openai/clip-vit-base-patch32 --task default clip

But as I understand here, you want to export two models?

yes,i try to convert by transformer.onnx but failed, my code like this:

model = CLIPModel.from_pretrained(model_path) processor = CLIPProcessor.from_pretrained(model_path) text = processor.tokenizer("[UNK]”, return_tensors="np") image = processor.feature_extractor(Image.open("CLIP.png")) text_model = model.text_model image_model = model.vision_model onnx_inputs, onnx_outputs = export( preprocessor=tokenizer, model=text_model, config=onnx_config, opset=10, output=onnx_model_path )

michaelbenayoun commented 1 year ago

You want what kind of inputs?

Anyways, you should use optimum.exporters.onnx for this. You should be able to export the text model easily because we have a CLIPTextOnnxConfig.

For the rest we have CLIPOnnxConfig as well.

susht3 commented 1 year ago

CLIPTextOnnxConfig.

thanks,and which is clip visual onxx config? i can't find it

michaelbenayoun commented 1 year ago

I think we do not have it, but you can make a PR and add it if you are interested!

YHD23 commented 8 months ago

with torch.no_grad(): image_features = model.encode_image(image)

torch.onnx.export(model.visual,
                  image,
                  "image_encoder.onnx",
                  input_names=("images", ),
                  output_names=("image_features", ),
                  dynamic_axes={"images": {
                      0: "num_image"
                  }})
# text_features = model.encode_text(text)

text_features = model(text)

torch.onnx.export(model, (text, ),
                  "text_encoder.onnx",
                  input_names=("texts", ),
                  output_names=("text_features", ),
                  dynamic_axes={"texts": {
                      0: "num_text"
                  }})

                  Coding like this, then you can get the image encoder and text encoder onnx model respectively
Gforky commented 1 month ago

You want what kind of inputs?

Anyways, you should use optimum.exporters.onnx for this. You should be able to export the text model easily because we have a CLIPTextOnnxConfig.

For the rest we have CLIPOnnxConfig as well.

Hi, could you please show more hints about how to specifically export clip-text-model using optimum.exporters.onnx?

michaelbenayoun commented 1 month ago

Maybe @mht-sharma ?