vllm upgrade to CUDA12 - Githubissues

vLLM now use CUDA12.

Also, I can't confirm your results on RTX 3090

mcl-llm

Statistics: ----------- prefill -----------
throughput: 218.2 tok/s
total tokens: 7 tok
total time: 0.0 s
------------ decode ------------
throughput: 170.7 tok/s
total tokens: 256 tok
total time: 1.5 s

vllm(when use 4-bit AWQ model)

Avg latency: 1.4600699121753375 seconds
Speed: 175.33 tok/s
Speed: 0.00570 s/tok

mlc-ai / llm-perf-bench

vllm upgrade to CUDA12 #35