Open RanchiZhao opened 1 week ago
quanto models are pytorch models, so I don't see a reason why they would not be compatible with any other tool running pytorch models. However, since vLLM modifies the graph of transformers models, the model should be quantized inside vLLM, and not inside transformers.
I wonder is quanto-quantized model available using vllm?