microsoft / onnxruntime-genai

Generative AI extensions for onnxruntime
MIT License
465 stars 112 forks source link

Can I inference Phi-3-vision with batch? #720

Closed 2U1 closed 1 month ago

2U1 commented 2 months ago

Thanks for the conversion code for phi3-vision. I'm making a app for concurrent requests that need continuous batching. Can I inference phi3-vision with batchsize larger than 1 ( I mean in onnx model)?

There's some other examples for LLM but not VLM so, I'm not sure how to make it.

baijumeswani commented 2 months ago

Currently, the model does not support batching. The onnx model used behind the scenes is optimized to work with batch size 1 and will not work for batch size > 1.

Adding batch size support for such models is on our roadmap but I do not know when we will be able to prioritize this. It heavily depends on whether we are able to make the onnx model support batch size > 1.