Open CharlesRiggins opened 2 months ago
@mgoin @horheynm - could you provide an example of this?
Hi @CharlesRiggins
Thank you for using llm-compressor. We are currently working on this feature this sprint! You will be able to do this very shortly, please give us couple days and we will get back with example script for you to try out!
@horheynm do we have an update on the status for this?
I want to quantize the KV cache to FP8 E4M3 on top of GPTQ. Is it possible to do it with llm-compressor?