mit-han-lab / TinyChatEngine

TinyChatEngine: On-Device LLM Inference Library
https://mit-han-lab.github.io/TinyChatEngine/
MIT License
624 stars 58 forks source link

Python extension for Metal kernels #77

Open casper-hansen opened 7 months ago

casper-hansen commented 7 months ago

Dear authors of TinyChatEngine, I was looking at your code here and would love to discuss how we can hook the Metal and AVX kernels into Python such that they can be reused in other frameworks like AutoAWQ. I am most interested in the W4A16 part since that is what AutoAWQ focuses on for now.

What do you think the next steps are to create a Python extension and would you be willing to help give me pointers on how to do it in AutoAWQ? https://github.com/casper-hansen/AutoAWQ