microsoft / T-MAC

Low-bit LLM inference on CPU with lookup table
MIT License
420 stars 32 forks source link

how to add new models and their kernels? #21

Closed jason-zou closed 1 week ago

kaleid-liner commented 3 weeks ago

If you are using a model in GPTQ format, you can specify -m gptq-auto to automatically detect the kernels and other quantization configurations. Check the usage section for more details.