guinmoon / LLMFarm

llama and other large language models on iOS and MacOS offline using GGML library.
https://llmfarm.site
MIT License
1.06k stars 64 forks source link

stablelm-zephyr-3b-GGUF model fails to load #25

Closed gblazex closed 6 months ago

gblazex commented 6 months ago

https://huggingface.co/TheBloke/stablelm-zephyr-3b-GGUF

Both on iphone and mac, so it must be a problem with the program

no descriptive error message just "model failed to load"

Mistral 7b works fine.

guinmoon commented 6 months ago

Hi. I downloaded the Q4_K_M version and I have everything working. Could you please send me a screenshot of the chat settings?

gblazex commented 6 months ago

just default llama settings

Screenshot 2023-12-16 at 12 01 48 Screenshot 2023-12-16 at 12 02 19
gblazex commented 6 months ago

any logging i can send you?

guinmoon commented 6 months ago

Are you using application version 0.8.1?

gblazex commented 6 months ago

that was a great lead!

okay I got it to work with 0.8.1 (i had 0.8.1 on phone from testflight but mac version was from app store).

Question is how to set up prompt format. E.g. for this model it says:

<|user|>
List 3 synonyms for the word "tiny"<|endoftext|>
<|assistant|>
1. Dwarf
2. Little
3. Petite<|endoftext|>

https://huggingface.co/TheBloke/stablelm-zephyr-3b-GGUF#usage

Prompt, reverse prompt, special, BOS, EOS

How can you set these up?

gblazex commented 6 months ago

This is what I used, but not sure if how your app would add <|endoftext|>

<|user|>
{{prompt}}<|endoftext|>
<|assistant|>

There is EOS (End-of-sequence?) switch but doesn't let me specify what that token looks like.

guinmoon commented 6 months ago

<|user|> {{prompt}}<|endoftext|> <|assistant|>

Try using this template with: BOS - false EOS - false Special - true reverse - <|endoftext|>

I'll post a wiki about the options later.