neuralmagic / nm-vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://nm-vllm.readthedocs.io
Other
251 stars 10 forks source link

Upstream sync 2024 07 07 #366

Closed robertgshaw2-neuralmagic closed 4 months ago

robertgshaw2-neuralmagic commented 4 months ago

Upstream sync 2024 07 07 (https://github.com/neuralmagic/nm-vllm/pull/355) - ties to v0.5.1 of upstream. Release candidate

SUMMARY:

COMPARE vs UPSTREAM: https://github.com/neuralmagic/nm-vllm/compare/upstream-sync-2024-07-07..79d406e9183aa12cdef6f1876eb9a15385662587

robertgshaw2-neuralmagic commented 4 months ago

LGTM! We should have a dictionary of fixed search terms like RE-ENABLE to make it easier to keep track of changes

We are going to add pytest marks