.env.local config for llama-2-7b.Q4_K_S.gguf with llama.cpp server

smamindl commented 8 months ago

I am using the following .env.local with llama-2-7b.Q4_K_S.gguf and llama prompt template

MODELS=`[
  {
      "name": "llama-2-7b.Q4_K_S.gguf",
      "chatPromptTemplate": "<s>[INST] <<SYS>>\n{{preprompt}}\n<</SYS>>\n\n{{#each messages}}{{#ifUser}}{{content}} [/INST] {{/ifUser}}{{#ifAssistant}}{{content}} </s><s>[INST] {{/ifAssistant}}{{/each}}",
      "parameters": {
        "temperature": 0.1,
        "top_p": 0.95,
        "repetition_penalty": 1.2,
        "top_k": 50,
        "truncate": 1000,
        "max_new_tokens": 2048,
        "stop": ["</s>"]
      },
      "endpoints": [
        {
         "url": "http://127.0.0.1:8080",
         "type": "llamacpp"
        }
      ]
  }
]`

I am trying to get this work with chat-ui and it doesn't work and chat-ui is frozen. However server is receiving request from client.

nsarrazin commented 8 months ago

Quick question how did you start your llama.cpp server ? Did you specify -np 3 in the parameters ?

smamindl commented 8 months ago

Quick question how did you start your llama.cpp server ? Did you specify -np 3 in the parameters ?

@nsarrazin yes I have specified -np 2

MDCurrent commented 7 months ago

likely resolved with my PR! https://github.com/huggingface/chat-ui/pull/867 check out my branch and see if it helps :heart:

huggingface / chat-ui

.env.local config for llama-2-7b.Q4_K_S.gguf with llama.cpp server #747