Updates for vllm 0.6.2 - Githubissues

intel-analytics / ipex-llm

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Mixtral, Gemma, Phi, MiniCPM, Qwen-VL, MiniCPM-V, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, vLLM, GraphRAG, DeepSpeed, Axolotl, etc

Apache License 2.0

6.75k stars 1.27k forks source link

Updates for vllm 0.6.2 #12338

Closed gc-fu closed 2 weeks ago

gc-fu commented 3 weeks ago

Description

Updates for vLLm to using vLLM 0.6.2.

We need to change the followings:

[x] Initial Dockerfile
[x] vLLM related updates
[x] update benchmark_latency.py
[x] update benchmark_throughput.py
[x] Examples, in ipex-llm/python/llm/example/GPU/vLLM-Serving
[x] vLLM worker
[x] Update final Dockerfile before merge, this is for changing building branches, check the TODO in the code.
[x] Merge https://github.com/analytics-zoo/vllm/pull/47
[x] Test image functionality... Done by Wang, Jun

1. Why the change?

2. User API changes

3. Summary of the change

4. How to test?

[ ] N/A
[x] Unit test: Please manually trigger the PR Validation here by inputting the PR number (e.g., 1234). And paste your action link here once it has been successfully finished.
[ ] Application test
[ ] Document test
[ ] ...

5. Known issues

[x] Sometimes, this will fail on initial start up, and got timeout error...

gc-fu commented 2 weeks ago

https://github.com/intel-analytics/ipex-llm-workflow/actions/runs/11699472618

gc-fu commented 2 weeks ago

https://github.com/intel-analytics/ipex-llm-workflow/actions/runs/11793680121