Generation speed issue - Githubissues

huggingface / swift-chat

Mac app to demonstrate swift-transformers

Apache License 2.0

470 stars 35 forks source link

Open eagle705 opened 3 weeks ago

eagle705 commented 3 weeks ago

I load llama2 model like example successfully but the speed to generate text is really slow.

[1] I'm not sure it use mps to accelerate generation. How to confirm it? [2] Is there a smaller LLM than 7B?

Here is my env

eemilk commented 3 weeks ago

Also you can try to upgrade to macOS 15 sequoia. There is a lot of performance optimisation on on-device LLMs in that