Open JoelNiklaus opened 3 days ago
Currently, inference of open models on my Mac device is quite slow since vllm does not support mps.
Llama.cpp does support mps and would significantly speed up local evaluation of open models.
Allowing the use of the mps device in other ways of loading models would also work.
Hi! Feel free to open a PR for this if you need it fast as our roadmap for EOY is full :)
Sounds good. Might do at some point, for now it is not a priority for me.
would be an awesome feature IMO! cc @gary149
Issue encountered
Currently, inference of open models on my Mac device is quite slow since vllm does not support mps.
Solution/Feature
Llama.cpp does support mps and would significantly speed up local evaluation of open models.
Posssible alternatives
Allowing the use of the mps device in other ways of loading models would also work.