microsoft / T-MAC

Low-bit LLM inference on CPU with lookup table
MIT License
588 stars 44 forks source link

Merge latest llama.cpp with OpenMP for better multi-threading performance and more models such as qwen2. #54

Closed kaleid-liner closed 1 month ago