huggingface / optimum-quanto

A pytorch quantization backend for optimum
Apache License 2.0
645 stars 36 forks source link

VLLM Supported? #220

Open RanchiZhao opened 1 week ago

RanchiZhao commented 1 week ago

I wonder is quanto-quantized model available using vllm?

dacorvo commented 4 days ago

quanto models are pytorch models, so I don't see a reason why they would not be compatible with any other tool running pytorch models. However, since vLLM modifies the graph of transformers models, the model should be quantized inside vLLM, and not inside transformers.