Open VietDunghacker opened 3 months ago
Using the infer_backend vllm allows for batch inference.
The "inference_vllm" can take a "request_list" as input.
Thank you.
@Jintao-Huang vllm is great, but unfortunately vllm does not support all models in this repo. For instance, Phi-3 Vision is supported in their Github repo but not in the official pip version. I really think it will be helpful if the feature is implemented natively in swift instead of relying on vllm.
Thanks for you suggestion! We have added batch inference for pytorch native to our todo list. This requirement will be accomplished in one sprint
How to perform batch inference with swift? I don't see it mentioned anywhere in the docs and I cannot find it in the code either.