Batch inference - Githubissues

modelscope / swift

ms-swift: Use PEFT or Full-parameter to finetune 300+ LLMs or 50+ MLLMs. (Qwen2, GLM4v, Internlm2.5, Yi, Llama3.1, Llava-Video, Internvl2, MiniCPM-V, Deepseek, Baichuan2, Gemma2, Phi3-Vision, ...)

https://swift.readthedocs.io/zh-cn/latest/

Apache License 2.0

2.55k stars 230 forks source link

Batch inference #1278

Open VietDunghacker opened 3 weeks ago

VietDunghacker commented 3 weeks ago

How to perform batch inference with swift? I don't see it mentioned anywhere in the docs and I cannot find it in the code either.

Jintao-Huang commented 3 weeks ago

Using the infer_backend vllm allows for batch inference.

https://github.com/modelscope/swift/blob/main/docs/source/LLM/VLLM%E6%8E%A8%E7%90%86%E5%8A%A0%E9%80%9F%E4%B8%8E%E9%83%A8%E7%BD%B2.md

The "inference_vllm" can take a "request_list" as input.

VietDunghacker commented 3 weeks ago

Thank you.

VietDunghacker commented 3 weeks ago

@Jintao-Huang vllm is great, but unfortunately vllm does not support all models in this repo. For instance, Phi-3 Vision is supported in their Github repo but not in the official pip version. I really think it will be helpful if the feature is implemented natively in swift instead of relying on vllm.

tastelikefeet commented 2 weeks ago

Thanks for you suggestion! We have added batch inference for pytorch native to our todo list. This requirement will be accomplished in one sprint