Closed zhuohan123 closed 7 months ago
Is it possible to support mlx for running inference on Mac devices? That would simplify the local development and running on cloud.
As mentioned in #2643, it would be awesome to have vLLM /completions
& /chat/completions
endpoints both supporting logprobs
to run lm-eval-harness
.
please take attention with "Evaluation of Accelerated and Non-Accelerated Large Model Output",it is very important and make sure they are always same
As mentioned in #2643, it would be awesome to have vLLM
/completions
&/chat/completions
endpoints both supportinglogprobs
to runlm-eval-harness
.
Agree 100%, the ability to use lm-eval-harness
is very much needed
https://github.com/vllm-project/vllm/issues/2573 this talks about optimising api server Optimize the performance of the API server
Please support ARM aarch-64 architecture.
https://github.com/vllm-project/vllm/issues/1253
please consider support streamingllm
Any update for PEFT?
please consider support huggingface peft, thank you. #https://github.com/vllm-project/vllm/issues/1129
Would you consider adding support for earlier ROCm versions, e.g. 5.6.1.? Thank you!
If is possible exl2 support thank you <3
Also, the ability to use Guidance/Outlines via logit_bias! And +1 to EXL2 support
support W8A8
Let's migrate out discussion to #3861
This document includes the features in vLLM's roadmap for Q1 2024. Please feel free to discuss and contribute to the specific features at related RFC/Issues/PRs and add anything else you'd like to talk about in this issue.
In the future, we will publish our roadmap quarterly and deprecate our old roadmap (#244).
torch.compile
support