issues
search
vllm-project
/
vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
30.64k
stars
4.65k
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
[Model] Fix Baichuan BNB online quantization
#10572
CNTRYROA
opened
4 hours ago
1
[V1] Refactor model executable interface for multimodal models
#10570
ywang96
opened
4 hours ago
1
[Bug]: llama-3.2-11B-vision run in vllm==0.6.3 OOM error(L20)
#10569
Jamrainbow
opened
4 hours ago
2
[Bugfix]Fix Baichuan BNB online quantization
#10568
CNTRYROA
closed
4 hours ago
1
[Bugfix] 500 Internal Server Error when tool_choice is incorrect.
#10567
shenoyvvarun
opened
4 hours ago
1
[Doc]: Docker+vllm+fastchat deploys multimodal large model Qwen2-vl-7b-instruct(docker+vllm+fastchat部署多模态大模型Qwen2-vl-7b-instruct)
#10566
Aanlifire
opened
5 hours ago
0
[Hardware][Intel-Gaudi] Enable LoRA support for Intel Gaudi (HPU)
#10565
SanjuCSudhakaran
opened
5 hours ago
2
[V1] EngineCore supports profiling
#10564
Abatom
opened
6 hours ago
1
[Minor] Fix line-too-long
#10563
WoosukKwon
closed
8 hours ago
1
[Feature]: How to run speculative models with tensor parallelism?
#10562
cxxuser
opened
8 hours ago
1
[Model] Added GLM-4 series model support vllm==0.6.4
#10561
sixsixcoder
opened
9 hours ago
1
add cleaned multi memory support
#10560
ClarkChin08
closed
10 hours ago
1
[Bug]: Speculative Decoding without enabling eager mode returns gibberish output after some tokens.
#10559
andoorve
opened
11 hours ago
0
[torch.compile] support all attention backends
#10558
youkaichao
opened
11 hours ago
1
[Benchmark] Benchmark structured output with datasets
#10557
xuechendi
opened
11 hours ago
1
[Usage]: Docker w/ CPU fails when defining VLLM_CPU_OMP_THREADS_BIND
#10556
ccruttjr
opened
12 hours ago
5
[platforms] absorb worker cls difference into platforms folder
#10555
youkaichao
closed
7 hours ago
7
[Docs] Add dedicated tool calling page to docs
#10554
mgoin
opened
13 hours ago
2
[misc] improve error message
#10553
youkaichao
closed
14 hours ago
1
[9/N] torch.compile LLM usage
#10552
youkaichao
closed
8 hours ago
1
Remove token-adding chat embedding params
#10551
noamgat
closed
4 hours ago
3
Add small example to metrics.rst
#10550
mgoin
closed
12 hours ago
1
support bitsandbytes quantization with qwen model
#10549
zixuanzhang226
opened
16 hours ago
1
[Usage]: Can we extend the context length of gemma2 model or other models?
#10548
hahmad2008
opened
16 hours ago
2
[Benchmark] Add new H100 machine
#10547
simon-mo
closed
9 hours ago
1
[Misc] Enable vLLM to Dynamically Load LoRA from a Remote Server
#10546
angkywilliam
opened
16 hours ago
6
[Minor] Revert change in offline inference example
#10545
WoosukKwon
closed
14 hours ago
1
Update default max_num_batch_tokens for chunked prefill to 2048
#10544
mgoin
opened
19 hours ago
2
[do-not-merge] Ibm 20241121
#10543
fialhocoelho
closed
20 hours ago
2
[Distributed] Tensor Parallel RMSNorm
#10542
tlrmchlsmth
opened
20 hours ago
2
[Bugfix][Hardware][CPU] Fix `multi_modal_kwargs` broadcast for CPU tensor parallel
#10541
Isotr0py
opened
20 hours ago
2
[Installation]: can't get the cu118 version of vllm 0.6.3 by https://github.com/vllm-project/vllm/releases/download/v0.6.3/vllm-0.6.3+cu118-cp310-cp310-manylinux1_x86_64.whl
#10540
mayfool
opened
21 hours ago
0
[Feature]: Support for Registering Model-Specific Default Sampling Parameters
#10539
yansh97
opened
21 hours ago
1
For ppc64le, disabled tests for now and addressed space issues
#10538
npanpaliya
opened
22 hours ago
2
[Usage]: How to use ROPE scaling for llama3.1 and gemma2?
#10537
hahmad2008
opened
22 hours ago
2
[Bugfix] Allow token ID-only inputs in Qwen2-Audio
#10536
DarkLight1337
closed
17 hours ago
1
[CI][Installation] Avoid uploading CUDA 11.8 wheel
#10535
cermeng
closed
15 hours ago
6
[Usage]: Fail to load params.json
#10534
dequeueing
opened
23 hours ago
3
[Bug]: vllm failed to run two instance with one gpu
#10533
pandada8
closed
23 hours ago
3
Add Sageattention backend
#10532
flozi00
opened
1 day ago
3
[Bug]: Authorization ignored when root_path is set
#10531
OskarLiew
opened
1 day ago
3
[Misc] Suppress duplicated logging regarding multimodal input pipeline
#10530
ywang96
closed
18 hours ago
5
[8/N] enable cli flag without a space
#10529
youkaichao
closed
15 hours ago
2
[V1] Fix Compilation config & Enable CUDA graph by default
#10528
WoosukKwon
closed
15 hours ago
1
[Usage]: Optimizing TTFT for Qwen2.5-72B Model Deployment on A800 GPUs for RAG Application
#10527
zhanghx0905
opened
1 day ago
2
[Feature]: Additional possible value for `tool_choice`: `required`
#10526
fahadh4ilyas
opened
1 day ago
1
[Bug]: Gemma2 becomes a fool.
#10525
Foreist
opened
1 day ago
5
fix the issue that len(tokenizer(prompt)["input_ids"]) > prompt_len
#10524
sywangyi
closed
1 day ago
7
[Bug]: torch.distributed.DistBackendError: NCCL error in: ../torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:1333, unhandled system error (run with NCCL_DEBUG=INFO for details), NCCL version 2.18.1
#10523
QualityGN
opened
1 day ago
0
[Kernel] Register punica ops directly
#10522
jeejeelee
closed
18 hours ago
2
Next