issues
search
neuralmagic
/
nm-vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
https://nm-vllm.readthedocs.io
Other
251
stars
10
forks
source link
Compressed tensors fp8
#358
Closed
robertgshaw2-neuralmagic
closed
4 months ago
robertgshaw2-neuralmagic
commented
4 months ago
draft
draft