Accelerate inference on Intel and M1 with int8 activation quantization - Githubissues

mit-han-lab / TinyChatEngine

TinyChatEngine: On-Device LLM Inference Library

https://mit-han-lab.github.io/TinyChatEngine/

MIT License

647 stars 62 forks source link

Accelerate inference on Intel and M1 with int8 activation quantization #14

Closed meenchen closed 1 year ago