Closed EnderRobber101 closed 3 weeks ago
Have you considered using CoreML to access the neural engine? I thought it would boost speeds. Thanks!
llama.cpp only uses Metal at the moment. And LLMFarm is completely based on llama.cpp. The reasons why the Neural engine is not used can be found here.
llama.cpp
Have you considered using CoreML to access the neural engine? I thought it would boost speeds. Thanks!