ERR_STREAM_PREMATURE_CLOSE when using locally hosted model via Text Generation WebUI API

Hello.

I am currently hosting my own model via TGWUI, with an OpenAI endpoint ready and working:

chameleon@komodo ~/M/chat-ui (main)> curl http://127.0.0.1:8081/v1/completions \  
                                           -H "Content-Type: application/json" \
                                           -d '{
                                         "prompt": "This is a cake recipe:\n\n1.",
                                         "max_tokens": 200,
                                         "temperature": 1,
                                         "top_p": 0.9,
                                         "seed": 10
                                       }'
{"id":"conv-1712066354875182592","object":"text_completion","created":1712066354,"model":"Mixtral-8x7B-instruct-exl2_3.5bpw","choices":[{"index":0,"finish_reason":"length","text":" Preheat the oven to 160 degrees Celsius.\n2. Beat two cups of sugar with a quarter cup of melted butter and a cup of oil until creamy.\n3. Add one at a time, 4 eggs.\n4. In a bowl sift 3 cups of all-purpose flour with 1 tbsp of baking powder and 1 tsp of salt.\n5. Add 2 cups of milk, a tbsp of vanilla and the sifted dry ingredients.\n6. Beat just until smooth, fold in a cup of nuts (chopped or whole, your choice).\n7. Pour the mixture in a well-greased 9x13x2 pan.\n8. Sprinkle 2 cups of oatmeal over the top.\n9. Bake for 50-60 minutes, let cool, then serve.\n\nThe result is a cake which","logprobs":{"top_logprobs":[{}]}}],"usage":{"prompt_tokens":11,"completion_tokens":202,"total_tokens":213}}⏎                                                                  
chameleon@komodo ~/M/chat-ui (main)>

However, when I am trying to add the same model into HF Chat-UI with the following config:

MODELS=`[{
        "name": "mistralai/Mixtral-8x7B-Instruct-v0.1",
        "displayName": "mistralai/Mixtral-8x7B-Instruct-v0.1",
        "description": "The Mixtral-8x7B Large Language Model (LLM) is a pretrained generative Sparse Mixture of Experts. The Mixtral-8x7B outperforms Llama 2 70B on most benchmarks we tested.",
        "websiteUrl": "https://mistral.ai/news/mixtral-of-experts/",
        "preprompt": "",
        "chatPromptTemplate": "<s>{{#each messages}}{{#ifUser}}[INST] {{#if @first}}{{#if @root.preprompt}}{{@root.preprompt}}\n{{/if}}{{/if}}{{content}} [/INST]{{/ifUser}}{{#ifAssistant}}{{content}}</s>{{/ifAssistant}}{{/each}}",
        "parameters": {
            "temperature": 0.7,
            "top_p": 0.95,
            "repetition_penalty": 1.2,
            "top_k": 50,
            "truncate": 3072,
            "max_new_tokens": 2048,
            "stop": [
                "</s>"
            ]
        },
        "promptExamples": [
            {
                "title": "Assist in a task",
                "prompt": "How do I make a delicious lemon cheesecake?"
            }
        ],
        "endpoints": [
            {
                "type": "openai",
                "baseURL": "http://127.0.0.1:8081/v1"
            }
        ]
    }
]`

It doesn't work and outputs the following error in the logs:

Apr 02 16:50:37 komodo systemd[1]: Started chat_ui_d.service - Web UI for mixtral.
Apr 02 16:50:38 komodo npm[1050731]: > chat-ui@0.7.0 dev
Apr 02 16:50:38 komodo npm[1050731]: > vite dev --host
Apr 02 16:50:39 komodo npm[1050747]:   VITE v4.5.2  ready in 1192 ms
Apr 02 16:50:39 komodo npm[1050747]:   ➜  Local:   http://localhost:5173/
Apr 02 16:50:39 komodo npm[1050747]:   ➜  Network: http://192.168.1.69:5173/
Apr 02 16:50:39 komodo npm[1050747]:   ➜  Network: http://172.20.0.1:5173/
Apr 02 16:50:40 komodo npm[1050747]: [MIGRATIONS] All migrations already applied.
Apr 02 16:50:53 komodo npm[1050747]: Error: Premature close
Apr 02 16:50:53 komodo npm[1050747]:     at IncomingMessage.<anonymous> (/home/chameleon/Models/chat-ui/node_modules/node-fetch/lib/index.js:1748:18)
Apr 02 16:50:53 komodo npm[1050747]:     at Object.onceWrapper (node:events:632:28)
Apr 02 16:50:53 komodo npm[1050747]:     at IncomingMessage.emit (node:events:518:28)
Apr 02 16:50:53 komodo npm[1050747]:     at emitCloseNT (node:internal/streams/destroy:147:10)
Apr 02 16:50:53 komodo npm[1050747]:     at process.processTicksAndRejections (node:internal/process/task_queues:81:21) {
Apr 02 16:50:53 komodo npm[1050747]:   code: 'ERR_STREAM_PREMATURE_CLOSE'
Apr 02 16:50:53 komodo npm[1050747]: }

It dies instantly, seems like it doesn't even do the request, since I do not see any increase in GPU usage on my host pc.

What am I doing wrong?

huggingface / chat-ui

ERR_STREAM_PREMATURE_CLOSE when using locally hosted model via Text Generation WebUI API #971