Closed ldotlopez closed 10 months ago
Hi @ldotlopez As you mentioned "a long conversation is generated" maybe it is the Hallucination. Some times it can happens. For example, if the input is too short.
And I am not sure text-generation-ui
? Do you mean here need more template of configuration of example for this model?
Hi @Aisuko , would it be possible to provide more template of configuration for this model with LocalAI?
Hi! I'm a bot running with LocalAI ( a crazy experiment of @mudler ) - please beware that I might hallucinate sometimes!
_but.... I can also be funny or helpful :smilecat: and I can provide generally speaking good tips or places where to look after in the documentation or in the code based on what you wrote in the issue.
Don't engage in conversation with me, I don't support (yet) replying!
To fix the issue with the Mistral model, you can try the following steps:
Sources:
Hi @lunamidori5, Do we have an example to support the model @saptarshidatta96 mentioned above?
Hello @Aisuko it looks like that model just uses a template file that looks like this
{{.Input}}
Seems that mistral-7b-instruct (finetuned for conversation) uses this template:
<s>[INST] Instruction [/INST] Model answer</s>[INST] Follow-up instruction [/INST]
Note that <s> and </s> are special tokens for beginning of string (BOS) and end of string (EOS) while [INST] and [/INST] are regular strings.
I'm not sure how to translate that to go templates neither how to express special tokens like <s>
@ldotlopez the one I posted, due to the fact that BOS and EOS tokens are now apart of the models GGUF file.
If not try this
[INST] {{.Input}} [/INST]
@ldotlopez https://github.com/mudler/LocalAI/tree/master/examples/configurations is a full example on how to use mistral/openorca, let me know if that's not working for you!
(This was closed by error but reopened as to let others reply. Thank you for your understanding)
@ldotlopez https://github.com/mudler/LocalAI/tree/master/examples/configurations is a full example on how to use mistral/openorca, let me know if that's not working for you!
Hi @mudler, thanks for the pointer… openorca-mistral-7b works almost fine, I get the stop_word in the response in some cases… :-/
Works fine while using raw AP, with cURL for example:
$ curl -s $LOCALAI/v1/chat/completions -H "Content-Type: application/json" -d '{
"model": "mistral",
"messages": [{"role": "user", "content": "Hi"}]
}' | json_pp
{
"choices" : [
{
"finish_reason" : "stop",
"index" : 0,
"message" : {
"content" : "Hello! How can I help you today?\n\n",
"role" : "assistant"
}
}
],
…
}
But I get some artifacts while using some frontends like ChatWizard or ChatBot-ui…
@ldotlopez https://github.com/mudler/LocalAI/tree/master/examples/configurations is a full example on how to use mistral/openorca, let me know if that's not working for you!
Hi @mudler, thanks for the pointer… openorca-mistral-7b works almost fine, I get the stop_word in the response in some cases… :-/
Works fine while using raw AP, with cURL for example:
$ curl -s $LOCALAI/v1/chat/completions -H "Content-Type: application/json" -d '{ "model": "mistral", "messages": [{"role": "user", "content": "Hi"}] }' | json_pp { "choices" : [ { "finish_reason" : "stop", "index" : 0, "message" : { "content" : "Hello! How can I help you today?\n\n", "role" : "assistant" } } ], … }
But I get some artifacts while using some frontends like ChatWizard or ChatBot-ui…
what version of LocalAI are you using? I've just cut a v1.40.0
but mistral should be working with master
(or v1.40.0
which should be published soon)
what version of LocalAI are you using? I've just cut a
v1.40.0
but mistral should be working withmaster
(orv1.40.0
which should be published soon)
Just check now: master / 1.40.0
$ /opt/LocalAI/src/local-ai --models-path /opt/LocalAI/models/ --address :8081 --debug
7:57AM DBG no galleries to load
7:57AM INF Starting LocalAI using 4 threads, with models path: /opt/LocalAI/models/
7:57AM INF LocalAI version: v1.40.0 (6ef7ea2635ae5371be4e6eef516c2ad4afd9f4a3)
This is what I get on LocalAI after sending 'hi' from chatbot UI (my system prompt is sightly modified from defaults).
The reply to this ends with the <|im_end|>
mark
8:06AM DBG Prompt (after templating): <|im_start|>system
You are a large language model. Follow user instructions carefully
<|im_end|>
<|im_start|>user
hi
<|im_end|>
<|im_start|>assistant
[172.21.0.2]:39638 200 - POST /v1/chat/completions
What bothers me is that API is working fine but some frontends no, maybe it's frontends' fault? It's that possible?
PS. I'm using the example configuration files but with mistral-7b-openorca.Q4_K_M.gguf
model and gpu_layers: 32
. Tested without gpu_layers with the same results
I just tested the precompiled binary (avx2 variant) and docker image master-cuda11-ffmpeg and both works fine (version 6ef7ea2635ae5371be4e6eef516c2ad4afd9f4a3).
There must be something wrong in my build, I will check it again but we can consider this issue low priority
If the frontend is using the new Openai api, it will fail, as far as my testing has shown, the thing is, the new api is not live yet... its only on rc0... see #1223 Im working to see if I can find a easy fix
This is an issue with streaming vs non. streaming outputs the <|im_end|>. Not sure if this is a LocalAI issue or a llama.cpp issue though
Mistral.ai released a few days ago a new base model, Mistral: https://docs.mistral.ai/quickstart/
New model can be used in LocalAI using the GGUF version via the llama backend: https://huggingface.co/TheBloke/Mistral-7B-v0.1-GGUF/
I was able to install it with this command:
Then, I reused the templates from vicuna. It quite work but doesn't. Completion doesn't work and chat kinda work… a long conversation is generated instead of just one answer.
I don't have sufficient knowledge to do this, but this model works in text-generation-ui may be some templates from there can work here.