I am currently hosting my own model via TGWUI, with an OpenAI endpoint ready and working:
chameleon@komodo ~/M/chat-ui (main)> curl http://127.0.0.1:8081/v1/completions \
-H "Content-Type: application/json" \
-d '{
"prompt": "This is a cake recipe:\n\n1.",
"max_tokens": 200,
"temperature": 1,
"top_p": 0.9,
"seed": 10
}'
{"id":"conv-1712066354875182592","object":"text_completion","created":1712066354,"model":"Mixtral-8x7B-instruct-exl2_3.5bpw","choices":[{"index":0,"finish_reason":"length","text":" Preheat the oven to 160 degrees Celsius.\n2. Beat two cups of sugar with a quarter cup of melted butter and a cup of oil until creamy.\n3. Add one at a time, 4 eggs.\n4. In a bowl sift 3 cups of all-purpose flour with 1 tbsp of baking powder and 1 tsp of salt.\n5. Add 2 cups of milk, a tbsp of vanilla and the sifted dry ingredients.\n6. Beat just until smooth, fold in a cup of nuts (chopped or whole, your choice).\n7. Pour the mixture in a well-greased 9x13x2 pan.\n8. Sprinkle 2 cups of oatmeal over the top.\n9. Bake for 50-60 minutes, let cool, then serve.\n\nThe result is a cake which","logprobs":{"top_logprobs":[{}]}}],"usage":{"prompt_tokens":11,"completion_tokens":202,"total_tokens":213}}⏎
chameleon@komodo ~/M/chat-ui (main)>
However, when I am trying to add the same model into HF Chat-UI with the following config:
MODELS=`[{
"name": "mistralai/Mixtral-8x7B-Instruct-v0.1",
"displayName": "mistralai/Mixtral-8x7B-Instruct-v0.1",
"description": "The Mixtral-8x7B Large Language Model (LLM) is a pretrained generative Sparse Mixture of Experts. The Mixtral-8x7B outperforms Llama 2 70B on most benchmarks we tested.",
"websiteUrl": "https://mistral.ai/news/mixtral-of-experts/",
"preprompt": "",
"chatPromptTemplate": "<s>{{#each messages}}{{#ifUser}}[INST] {{#if @first}}{{#if @root.preprompt}}{{@root.preprompt}}\n{{/if}}{{/if}}{{content}} [/INST]{{/ifUser}}{{#ifAssistant}}{{content}}</s>{{/ifAssistant}}{{/each}}",
"parameters": {
"temperature": 0.7,
"top_p": 0.95,
"repetition_penalty": 1.2,
"top_k": 50,
"truncate": 3072,
"max_new_tokens": 2048,
"stop": [
"</s>"
]
},
"promptExamples": [
{
"title": "Assist in a task",
"prompt": "How do I make a delicious lemon cheesecake?"
}
],
"endpoints": [
{
"type": "openai",
"baseURL": "http://127.0.0.1:8081/v1"
}
]
}
]`
It doesn't work and outputs the following error in the logs:
Apr 02 16:50:37 komodo systemd[1]: Started chat_ui_d.service - Web UI for mixtral.
Apr 02 16:50:38 komodo npm[1050731]: > chat-ui@0.7.0 dev
Apr 02 16:50:38 komodo npm[1050731]: > vite dev --host
Apr 02 16:50:39 komodo npm[1050747]: VITE v4.5.2 ready in 1192 ms
Apr 02 16:50:39 komodo npm[1050747]: ➜ Local: http://localhost:5173/
Apr 02 16:50:39 komodo npm[1050747]: ➜ Network: http://192.168.1.69:5173/
Apr 02 16:50:39 komodo npm[1050747]: ➜ Network: http://172.20.0.1:5173/
Apr 02 16:50:40 komodo npm[1050747]: [MIGRATIONS] All migrations already applied.
Apr 02 16:50:53 komodo npm[1050747]: Error: Premature close
Apr 02 16:50:53 komodo npm[1050747]: at IncomingMessage.<anonymous> (/home/chameleon/Models/chat-ui/node_modules/node-fetch/lib/index.js:1748:18)
Apr 02 16:50:53 komodo npm[1050747]: at Object.onceWrapper (node:events:632:28)
Apr 02 16:50:53 komodo npm[1050747]: at IncomingMessage.emit (node:events:518:28)
Apr 02 16:50:53 komodo npm[1050747]: at emitCloseNT (node:internal/streams/destroy:147:10)
Apr 02 16:50:53 komodo npm[1050747]: at process.processTicksAndRejections (node:internal/process/task_queues:81:21) {
Apr 02 16:50:53 komodo npm[1050747]: code: 'ERR_STREAM_PREMATURE_CLOSE'
Apr 02 16:50:53 komodo npm[1050747]: }
It dies instantly, seems like it doesn't even do the request, since I do not see any increase in GPU usage on my host pc.
Hello.
I am currently hosting my own model via TGWUI, with an OpenAI endpoint ready and working:
However, when I am trying to add the same model into HF Chat-UI with the following config:
It doesn't work and outputs the following error in the logs:
It dies instantly, seems like it doesn't even do the request, since I do not see any increase in GPU usage on my host pc.
What am I doing wrong?