microsoft / onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
https://onnxruntime.ai
MIT License
14.69k stars 2.93k forks source link

[Feature Request] Support for Florence-2 model family #21118

Open theolivenbaum opened 4 months ago

theolivenbaum commented 4 months ago

Describe the feature request

As per https://huggingface.co/microsoft/Florence-2-large-ft/discussions/7, it seems like the model type is not yet supported by the converter:

Can we get an Onnx version of this model for use on Windows .NET using the onnxruntime?

I tried to convert it but it failed. optimum-cli export onnx --trust-remote-code --model microsoft/Florence-2-large-ft ./models/Florence-2-large-ft_onnx/

ValueError: Unrecognized configuration class <class 'transformers_modules.microsoft.Florence-2-large-ft.f3c2bbf1d042a8976e0c43b3a3ead3f53a8dad88.configuration_florence2.Florence2Config'> for this kind of AutoModel: AutoModelForVision2Seq. Model type should be one of BlipConfig, Blip2Config, GitConfig, Idefics2Config, InstructBlipConfig, Kosmos2Config, LlavaConfig, LlavaNextConfig, PaliGemmaConfig, Pix2StructConfig, VideoLlavaConfig, VipLlavaConfig, VisionEncoderDecoderConfig.

Describe scenario use case

New Florence-2 model family should be supported by onnxruntime

shubham0204 commented 4 months ago

@theolivenbaum Xenova from HuggingFace has uploaded the ONNX models for Florence-2 on onnx-community repo

thalapandi commented 3 months ago

is there any python inference code is available for florence-2-large-ft using onnx model

barbolo commented 3 months ago

From what I understand, there are multiple ONNX weights that might be used depending on the task (Caption, OCR, ...) of Florence-2. I believe it's impossible at this moment to have a single ONNX weights + model file that would be able to perform multiple tasks, right?

I believe ONNX runtime optimizes the inference graph and might optimize certain CPU operations, so it wouldn't be a good fit for a multi purpose model. Am I right?

theolivenbaum commented 3 months ago

If anyone is interested: we reimplemented the florence-2 logic in C# and released it here, also available ready to use on nuget.

Source82 commented 2 months ago

If anyone is interested: we reimplemented the florence-2 logic in C# and released it here, also available ready to use on nuget.

Please can you provide details of how you created the onnx, one will like to try for custom version

tgalery commented 1 month ago

From what I understand, there are multiple ONNX weights that might be used depending on the task (Caption, OCR, ...) of Florence-2. I believe it's impossible at this moment to have a single ONNX weights + model file that would be able to perform multiple tasks, right?

I believe ONNX runtime optimizes the inference graph and might optimize certain CPU operations, so it wouldn't be a good fit for a multi purpose model. Am I right?

Any update on this ?