issues
search
neuralmagic
/
nm-vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
https://nm-vllm.readthedocs.io
Other
251
stars
10
forks
source link
Refactor moe
#347
Closed
robertgshaw2-neuralmagic
closed
4 months ago
robertgshaw2-neuralmagic
commented
4 months ago
Draft
Draft