Why is the inference FTL@1 longer after the vllm framework is quantized? - Githubissues

ninehills / llm-inference-benchmark

LLM Inference benchmark

MIT License

228 stars 6 forks source link

Why is the inference FTL@1 longer after the vllm framework is quantized? #1

Open luhairong11 opened 1 month ago

luhairong11 commented 1 month ago

ninehills commented 1 month ago

vLLM has already fixed this issue.

I will retest soon.