vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
30.27k stars 4.59k forks source link

[Roadmap] vLLM Roadmap Q1 2024 #2681

Closed zhuohan123 closed 7 months ago

zhuohan123 commented 9 months ago

This document includes the features in vLLM's roadmap for Q1 2024. Please feel free to discuss and contribute to the specific features at related RFC/Issues/PRs and add anything else you'd like to talk about in this issue.

In the future, we will publish our roadmap quarterly and deprecate our old roadmap (#244).

sandangel commented 9 months ago

Is it possible to support mlx for running inference on Mac devices? That would simplify the local development and running on cloud.

AguirreNicolas commented 9 months ago

As mentioned in #2643, it would be awesome to have vLLM /completions & /chat/completions endpoints both supporting logprobs to run lm-eval-harness.

PeterXiaTian commented 9 months ago

please take attention with "Evaluation of Accelerated and Non-Accelerated Large Model Output",it is very important and make sure they are always same

jrruethe commented 9 months ago

As mentioned in #2643, it would be awesome to have vLLM /completions & /chat/completions endpoints both supporting logprobs to run lm-eval-harness.

Agree 100%, the ability to use lm-eval-harness is very much needed

casper-hansen commented 9 months ago

2767 I suggest adding this to the roadmap as it's one of the more straight forward optimizations (someone already did the optimization work).

jalotra commented 9 months ago

https://github.com/vllm-project/vllm/issues/2573 this talks about optimising api server Optimize the performance of the API server

cyc00518 commented 8 months ago

Please support ARM aarch-64 architecture.

Tint0ri commented 8 months ago

https://github.com/vllm-project/vllm/issues/1253

please consider support streamingllm

kanseaveg commented 8 months ago

Any update for PEFT?

please consider support huggingface peft, thank you. #https://github.com/vllm-project/vllm/issues/1129

ekazakos commented 8 months ago

Would you consider adding support for earlier ROCm versions, e.g. 5.6.1.? Thank you!

pabl-o-ce commented 8 months ago

If is possible exl2 support thank you <3

hmellor commented 8 months ago

97 should be added to the automating the release process section

jrruethe commented 8 months ago

Also, the ability to use Guidance/Outlines via logit_bias! And +1 to EXL2 support

busishengui commented 8 months ago

support W8A8

simon-mo commented 7 months ago

Let's migrate out discussion to #3861