Open nivibilla opened 1 month ago
Yes. W8A8KV8/KV4 runtime is implemented in this repo. We are also working on the model converter to convert LMQuant-W8A8 checkpoints to QServe format.
Amazing thank you! Will wait for that.
Hi @nivibilla , we have prepared the scripts for W8A8 inference. Please refer to #4 . Thanks!
Does this also work for 8bit models? Or only 4bit.