microsoft / onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
https://onnxruntime.ai
MIT License
13.93k stars 2.81k forks source link

[Feature Request] Support for Florence-2 model family #21118

Open theolivenbaum opened 2 months ago

theolivenbaum commented 2 months ago

Describe the feature request

As per https://huggingface.co/microsoft/Florence-2-large-ft/discussions/7, it seems like the model type is not yet supported by the converter:

Can we get an Onnx version of this model for use on Windows .NET using the onnxruntime?

I tried to convert it but it failed. optimum-cli export onnx --trust-remote-code --model microsoft/Florence-2-large-ft ./models/Florence-2-large-ft_onnx/

ValueError: Unrecognized configuration class <class 'transformers_modules.microsoft.Florence-2-large-ft.f3c2bbf1d042a8976e0c43b3a3ead3f53a8dad88.configuration_florence2.Florence2Config'> for this kind of AutoModel: AutoModelForVision2Seq. Model type should be one of BlipConfig, Blip2Config, GitConfig, Idefics2Config, InstructBlipConfig, Kosmos2Config, LlavaConfig, LlavaNextConfig, PaliGemmaConfig, Pix2StructConfig, VideoLlavaConfig, VipLlavaConfig, VisionEncoderDecoderConfig.

Describe scenario use case

New Florence-2 model family should be supported by onnxruntime

shubham0204 commented 2 months ago

@theolivenbaum Xenova from HuggingFace has uploaded the ONNX models for Florence-2 on onnx-community repo

thalapandi commented 1 month ago

is there any python inference code is available for florence-2-large-ft using onnx model

barbolo commented 2 weeks ago

From what I understand, there are multiple ONNX weights that might be used depending on the task (Caption, OCR, ...) of Florence-2. I believe it's impossible at this moment to have a single ONNX weights + model file that would be able to perform multiple tasks, right?

I believe ONNX runtime optimizes the inference graph and might optimize certain CPU operations, so it wouldn't be a good fit for a multi purpose model. Am I right?

theolivenbaum commented 2 weeks ago

If anyone is interested: we reimplemented the florence-2 logic in C# and released it here, also available ready to use on nuget.