vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
27.61k stars 4.07k forks source link

[RFC]: quant llm from alpindale #8716

Open flozi00 opened 1 week ago

flozi00 commented 1 week ago

Motivation.

Higher throughput und memory savings are always cool 😎

I think that could be integrated very easily, what do you think about it's design ?

Proposed Change.

https://github.com/PygmalionAI/aphrodite-engine/commit/73177656ed75ec880a409640ef2b9a8043cf96a8

Feedback Period.

No response

CC List.

No response

Any Other Things.

No response

Before submitting a new issue...

flozi00 commented 5 days ago

https://github.com/vllm-project/vllm/pull/8751