smpanaro / coreml-llm-cli

CLI to demonstrate running a large language model (LLM) on Apple Neural Engine.
68 stars 4 forks source link

macOS 15 beta 3 performance #1

Closed amostamu closed 4 months ago

amostamu commented 4 months ago

m3 pro gain a big performance boost on macOS 15 Screenshot 2024-07-11 at 15 45 14 Screenshot 2024-07-11 at 15 45 35

amostamu commented 4 months ago

ane 7.5w

smpanaro commented 4 months ago

@amostamu Wow! Thanks for sharing. That looks like it is on the main branch, correct?

Try switching to the sequoia branch and running this:

swift run -c release LLMCLI --repo-id smpanaro/Llama-2-7b-coreml --repo-directory sequoia --max-new-tokens 80

Hopefully it’s even faster. 🤞

amostamu commented 4 months ago

Yes it is even faster when using sequoia branch. That is speed of second time Compile + Load: 0.51 sec Prompt : 726.74 sec 704.52 token / sec Generate : 85.41 +/- 2.67 ms / token 11.72 +/- 0.29 token / sec

smpanaro commented 4 months ago

Cool, thanks for trying. I expected/hoped it would be a bit higher but I'll take it.

smpanaro commented 4 months ago

Added this to the README. Thanks again for the info!