Batch inference - Githubissues

modelscope / ms-swift

Use PEFT or Full-parameter to finetune 350+ LLMs or 90+ MLLMs. (LLM: Qwen2.5, Llama3.2, GLM4, Internlm2.5, Yi1.5, Mistral, Baichuan2, DeepSeek, Gemma2, ...; MLLM: Qwen2-VL, Qwen2-Audio, Llama3.2-Vision, Llava, InternVL2, MiniCPM-V-2.6, GLM4v, Xcomposer2.5, Yi-VL, DeepSeek-VL, Phi3.5-Vision, ...)

https://swift.readthedocs.io/zh-cn/latest/Instruction/index.html

Apache License 2.0

3.7k stars 318 forks source link

Batch inference #1278

Open VietDunghacker opened 3 months ago

VietDunghacker commented 3 months ago

How to perform batch inference with swift? I don't see it mentioned anywhere in the docs and I cannot find it in the code either.

Jintao-Huang commented 3 months ago

Using the infer_backend vllm allows for batch inference.

https://github.com/modelscope/swift/blob/main/docs/source/LLM/VLLM%E6%8E%A8%E7%90%86%E5%8A%A0%E9%80%9F%E4%B8%8E%E9%83%A8%E7%BD%B2.md

The "inference_vllm" can take a "request_list" as input.

VietDunghacker commented 3 months ago

Thank you.

VietDunghacker commented 3 months ago

@Jintao-Huang vllm is great, but unfortunately vllm does not support all models in this repo. For instance, Phi-3 Vision is supported in their Github repo but not in the official pip version. I really think it will be helpful if the feature is implemented natively in swift instead of relying on vllm.

tastelikefeet commented 3 months ago

Thanks for you suggestion! We have added batch inference for pytorch native to our todo list. This requirement will be accomplished in one sprint