[Question]Whether T-MAC supports mixed-precision LLM?

AndreaChiChengdu commented 2 months ago

Just like the model weight contains I2, I3, I4 quantization type I checked the documentation and script, and it seems that it is not supported yet? thanks!

kaleid-liner commented 1 month ago

Both the inference kernel and the convert script already supported mixed precision quantization by detecting bits of each layer. However, I don't know if there are any tools to generate a mixed precision GPTQ model. If there is such a model, T-MAC can support it.

qw1319 commented 1 month ago

Both the inference kernel and the convert script already supported mixed precision quantization by detecting bits of each layer. However, I don't know if there are any tools to generate a mixed precision GPTQ model. If there is such a model, T-MAC can support it.

i have see t-mac tune kernel on shape 、bits and so on; compiled llama.cpp kernel only support one bit and net; how to support mixed network?(forexample: tuned and compilered llama.cpp can run 2bit bitnet, but they run other bits or network(4bits bitenet, 2bits llama2) will meet error）

microsoft / T-MAC

[Question]Whether T-MAC supports mixed-precision LLM? #38