Support for Phi-3 models

guinmoon / LLMFarm

llama and other large language models on iOS and MacOS offline using GGML library.

https://llmfarm.site

MIT License

1.05k stars 62 forks source link

Support for Phi-3 models #58

Closed retteghy closed 1 month ago

retteghy commented 2 months ago

see huggingface for the models

guinmoon commented 2 months ago

Hi. work normal with this template


<|user|>
{{prompt}}<|end|>
<|assistant|>

And BOS option enabled.

paulilioaica commented 2 months ago

Hi. How can I make it generate until EOS? If I select the option, the app crashes.

retteghy commented 2 months ago

Hi. work normal with this template
<|user|>
{{prompt}}<|end|>
<|assistant|>
And BOS option enabled.

BOS is enabled, I have set that prompt, but I am getting an error as reply for every message: Load Model Error: [Error] modelLoad Error Load Model Error: [Done]

jekriske-lilly commented 2 months ago

@guinmoon when you say "works normal" are you referring to the development version or the version in the App store?

The stable version from the app store isn't honoring the end token and the app crashes if you try enabling EOS.

guinmoon commented 2 months ago

development version

Cimplex commented 2 months ago

Hi. work normal with this template
<|user|>
{{prompt}}<|end|>
<|assistant|>
And BOS option enabled.
BOS is enabled, I have set that prompt, but I am getting an error as reply for every message: Load Model Error: [Error] modelLoad Error Load Model Error: [Done]

In the TestFlight version I’m using ‘Phi-3-mini-4k-instruct-q4.gguf’

When setting up, I used the “Phi 2” setting template and then wrote the recommended prompt. On my iPhone 14 Pro I’m getting around 2-5 token per second.

Sometimes the <|end|> tag isn’t handle correctly, and it just skips over it and starts a new answer

savkinavmono commented 2 months ago

Make sure Metal=on, BOS=on, EOS=off. And try setting contextsize=1024. I got 8-9 Tok/sec.

Officially phi3 is only supported starting with llama.cpp release b2717. The latest LLMFarm commit uses b2692. The Testflight version uses b2135 which officially supports only phi2.