huggingface / swift-chat

Mac app to demonstrate swift-transformers
Apache License 2.0
470 stars 35 forks source link

Generation speed issue #26

Open eagle705 opened 3 weeks ago

eagle705 commented 3 weeks ago

I load llama2 model like example successfully but the speed to generate text is really slow.

image

[1] I'm not sure it use mps to accelerate generation. How to confirm it? [2] Is there a smaller LLM than 7B?

Here is my env

eemilk commented 3 weeks ago

There is 1B and 3B OpenELM converted into coreml https://huggingface.co/corenet-community/coreml-OpenELM-1_1B-Instruct https://huggingface.co/corenet-community/coreml-OpenELM-3B-Instruct

Also you can try to upgrade to macOS 15 sequoia. There is a lot of performance optimisation on on-device LLMs in that