mit-han-lab / llm-awq

[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
MIT License
2.52k stars 200 forks source link

OpenCL support #123

Open leviathanch opened 12 months ago

leviathanch commented 12 months ago

Hi The inference engine seems to be specifically implemented for CUDA, which isn't very helpful on my laptop here which has a Radeon which I wanna use through its OpenCL MESA interface.