intel / auto-round

Advanced Quantization Algorithm for LLMs. This is official implementation of "Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs"
https://arxiv.org/abs/2309.05516
Apache License 2.0
200 stars 19 forks source link

Request for Apple Metal Device Support #200

Closed PabloButron closed 5 days ago

PabloButron commented 1 month ago

Hi AutoRound Team,

Firstly, thank you for your fantastic work on AutoRound. It has been incredibly useful for model quantization.

I am reaching out to inquire about the possibility of adding support for Apple Metal devices. Given the increasing use of Apple Silicon in both personal and professional contexts, having native support for Metal would be highly beneficial for a broader range of users. This would allow those using macOS and iOS devices to take full advantage of AutoRound's capabilities without needing additional hardware.

Specifics:

Use Case: Quantizing large language models and other AI models using AutoRound on Apple devices with M1, M2, or later chips. Potential Benefits: Enhanced performance on macOS and iOS platforms. Wider adoption of AutoRound in the Apple developer community. Increased flexibility for developers who use Apple hardware for their machine learning tasks. Current Limitations: Lack of Metal support limits the use of AutoRound to primarily Nvidia GPUs and CUDA environments, as noted in issues such as #161 and the Acceleration Documentation. Questions:

Are there any plans to support Apple Metal in the near future? If not, what would be the best way to contribute towards adding this support? Thank you for considering this request. Looking forward to your response.

Best regards, Pablo

wenhuach21 commented 1 month ago

Hi @PabloButron, Thank you for your interest in AutoRound. Currently, we have no plans to support Apple devices as we are a small team and lack the necessary hardware. However, we are open to collaboration and would appreciate any pull requests from you if you are interested in contributing.

To support this, we could approach it in two steps:

Deploy a Quantized Model on Apple Devices: We need to export or repack the model to be compatible with Apple devices. This should be doable as some Apple devices already support this, as detailed in the Core ML Tools documentation.

Run the Quantization Process on Apple Devices: Since we are not familiar with the Apple ecosystem, this may be challenging for us but not for experts like you. However, we are exploring fast configurations to reduce quantization time and resource usage on client devices, which could be beneficial.

We appreciate your interest and look forward to the possibility of working together.

Best regards, Wenhua

wenhuach21 commented 5 days ago

Please feel free to reopen this if you need further discussion