mit-han-lab / qserve

QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving
Apache License 2.0
333 stars 8 forks source link

Is 8bit supported? #2

Open nivibilla opened 1 month ago

nivibilla commented 1 month ago

Does this also work for 8bit models? Or only 4bit.

kentang-mit commented 1 month ago

Yes. W8A8KV8/KV4 runtime is implemented in this repo. We are also working on the model converter to convert LMQuant-W8A8 checkpoints to QServe format.

nivibilla commented 1 month ago

Amazing thank you! Will wait for that.

ys-2020 commented 1 month ago

Hi @nivibilla , we have prepared the scripts for W8A8 inference. Please refer to #4 . Thanks!