[RFC]: quant llm from alpindale - Githubissues

vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

https://docs.vllm.ai

Apache License 2.0

27.61k stars 4.07k forks source link

[RFC]: quant llm from alpindale #8716

Open flozi00 opened 1 week ago

flozi00 commented 1 week ago

Motivation.

Higher throughput und memory savings are always cool 😎

I think that could be integrated very easily, what do you think about it's design ?

Proposed Change.

https://github.com/PygmalionAI/aphrodite-engine/commit/73177656ed75ec880a409640ef2b9a8043cf96a8

Feedback Period.

No response

CC List.

No response

Any Other Things.

No response

Before submitting a new issue...

[X] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

flozi00 commented 5 days ago

https://github.com/vllm-project/vllm/pull/8751