Merge latest llama.cpp with OpenMP for better multi-threading performance and more models such as qwen2. - Githubissues

microsoft / T-MAC

Low-bit LLM inference on CPU with lookup table

MIT License

588 stars 44 forks source link

Merge latest llama.cpp with OpenMP for better multi-threading performance and more models such as qwen2. #54

Closed kaleid-liner closed 1 month ago