mudler / LocalAI

:robot: The free, Open Source alternative to OpenAI, Claude and others. Self-hosted and local-first. Drop-in replacement for OpenAI, running on consumer-grade hardware. No GPU required. Runs gguf, transformers, diffusers and many more models architectures. Features: Generate Text, Audio, Video, Images, Voice Cloning, Distributed inference
https://localai.io
MIT License
23.32k stars 1.78k forks source link

Templates for mistral model #1138

Closed ldotlopez closed 10 months ago

ldotlopez commented 11 months ago

Mistral.ai released a few days ago a new base model, Mistral: https://docs.mistral.ai/quickstart/

New model can be used in LocalAI using the GGUF version via the llama backend: https://huggingface.co/TheBloke/Mistral-7B-v0.1-GGUF/

I was able to install it with this command:

curl $LOCALAI/models/apply -H "Content-Type: application/json" -d '{
    "name": "mistral-7b",
    "files" : [
        {
            "sha256" : "",
            "uri" : "https://huggingface.co/TheBloke/Mistral-7B-v0.1-GGUF/resolve/main/mistral-7b-v0.1.Q4_0.gguf"
        }
    ],
    "overrides": {
        "backend": "llama",
        "gpu_layers": 35
    }
}'

Then, I reused the templates from vicuna. It quite work but doesn't. Completion doesn't work and chat kinda work… a long conversation is generated instead of just one answer.

I don't have sufficient knowledge to do this, but this model works in text-generation-ui may be some templates from there can work here.

Aisuko commented 11 months ago

Hi @ldotlopez As you mentioned "a long conversation is generated" maybe it is the Hallucination. Some times it can happens. For example, if the input is too short.

And I am not sure text-generation-ui? Do you mean here need more template of configuration of example for this model?

saptarshidatta96 commented 11 months ago

Hi @Aisuko , would it be possible to provide more template of configuration for this model with LocalAI?

localai-bot commented 11 months ago

:warning::warning::warning::warning::warning:

Hi! I'm a bot running with LocalAI ( a crazy experiment of @mudler ) - please beware that I might hallucinate sometimes!

_but.... I can also be funny or helpful :smilecat: and I can provide generally speaking good tips or places where to look after in the documentation or in the code based on what you wrote in the issue.

Don't engage in conversation with me, I don't support (yet) replying!

:warning::warning::warning::warning::warning:

To fix the issue with the Mistral model, you can try the following steps:

  1. Check if the model is compatible with the llama backend by looking at the model documentation or contacting the model maintainer.
  2. Update the prompt templates to use the correct syntax and format for the Mistral model. You can find examples of prompt templates in the Mistral documentation or on the LocalAI prompt template gallery.
  3. Adjust the override settings in the model definition to match the specific configuration requirements of the Mistral model, such as the number of GPU layers and the backend used.
  4. Ensure that the required dependencies are installed and configured correctly for the Mistral model to work properly. This may include installing additional packages or configuring environment variables.
  5. Test the model again to ensure that it is working correctly and providing accurate responses. If you continue to experience issues with the Mistral model, you can try reaching out to the model maintainer or the LocalAI community for further assistance.

Sources:

Aisuko commented 11 months ago

Hi @lunamidori5, Do we have an example to support the model @saptarshidatta96 mentioned above?

lunamidori5 commented 11 months ago

Hello @Aisuko it looks like that model just uses a template file that looks like this

{{.Input}}
ldotlopez commented 11 months ago

Seems that mistral-7b-instruct (finetuned for conversation) uses this template:

<s>[INST] Instruction [/INST] Model answer</s>[INST] Follow-up instruction [/INST]

Note that <s> and </s> are special tokens for beginning of string (BOS) and end of string (EOS) while [INST] and [/INST] are regular strings.

I'm not sure how to translate that to go templates neither how to express special tokens like <s>

lunamidori5 commented 11 months ago

@ldotlopez the one I posted, due to the fact that BOS and EOS tokens are now apart of the models GGUF file.

lunamidori5 commented 11 months ago

If not try this

[INST] {{.Input}} [/INST]
mudler commented 10 months ago

@ldotlopez https://github.com/mudler/LocalAI/tree/master/examples/configurations is a full example on how to use mistral/openorca, let me know if that's not working for you!

lunamidori5 commented 10 months ago

(This was closed by error but reopened as to let others reply. Thank you for your understanding)

ldotlopez commented 10 months ago

@ldotlopez https://github.com/mudler/LocalAI/tree/master/examples/configurations is a full example on how to use mistral/openorca, let me know if that's not working for you!

Hi @mudler, thanks for the pointer… openorca-mistral-7b works almost fine, I get the stop_word in the response in some cases… :-/

Works fine while using raw AP, with cURL for example:

$ curl -s $LOCALAI/v1/chat/completions -H "Content-Type: application/json" -d '{
     "model": "mistral",
     "messages": [{"role": "user", "content": "Hi"}]
   }' | json_pp
{
   "choices" : [
      {
         "finish_reason" : "stop",
         "index" : 0,
         "message" : {
            "content" : "Hello! How can I help you today?\n\n",
            "role" : "assistant"
         }
      }
   ],
   …
}

But I get some artifacts while using some frontends like ChatWizard or ChatBot-ui

imagen imagen

mudler commented 10 months ago

@ldotlopez https://github.com/mudler/LocalAI/tree/master/examples/configurations is a full example on how to use mistral/openorca, let me know if that's not working for you!

Hi @mudler, thanks for the pointer… openorca-mistral-7b works almost fine, I get the stop_word in the response in some cases… :-/

Works fine while using raw AP, with cURL for example:

$ curl -s $LOCALAI/v1/chat/completions -H "Content-Type: application/json" -d '{
     "model": "mistral",
     "messages": [{"role": "user", "content": "Hi"}]
   }' | json_pp
{
   "choices" : [
      {
         "finish_reason" : "stop",
         "index" : 0,
         "message" : {
            "content" : "Hello! How can I help you today?\n\n",
            "role" : "assistant"
         }
      }
   ],
   …
}

But I get some artifacts while using some frontends like ChatWizard or ChatBot-ui

imagen imagen

what version of LocalAI are you using? I've just cut a v1.40.0 but mistral should be working with master (or v1.40.0 which should be published soon)

ldotlopez commented 10 months ago

what version of LocalAI are you using? I've just cut a v1.40.0 but mistral should be working with master (or v1.40.0 which should be published soon)

Just check now: master / 1.40.0

$ /opt/LocalAI/src/local-ai --models-path /opt/LocalAI/models/  --address :8081 --debug
7:57AM DBG no galleries to load
7:57AM INF Starting LocalAI using 4 threads, with models path: /opt/LocalAI/models/
7:57AM INF LocalAI version: v1.40.0 (6ef7ea2635ae5371be4e6eef516c2ad4afd9f4a3)

This is what I get on LocalAI after sending 'hi' from chatbot UI (my system prompt is sightly modified from defaults). The reply to this ends with the <|im_end|> mark

8:06AM DBG Prompt (after templating): <|im_start|>system                                                                                                                     
You are a large language model. Follow user instructions carefully                                                                                                           
<|im_end|>                                                                                                                                                                   

<|im_start|>user                                                                                                                                                             
hi                                                                                                                                                                           
<|im_end|>                                                                                                                                                                   

<|im_start|>assistant                                                                                                                                                        

[172.21.0.2]:39638 200 - POST /v1/chat/completions     

What bothers me is that API is working fine but some frontends no, maybe it's frontends' fault? It's that possible?

PS. I'm using the example configuration files but with mistral-7b-openorca.Q4_K_M.gguf model and gpu_layers: 32. Tested without gpu_layers with the same results

ldotlopez commented 10 months ago

I just tested the precompiled binary (avx2 variant) and docker image master-cuda11-ffmpeg and both works fine (version 6ef7ea2635ae5371be4e6eef516c2ad4afd9f4a3).

There must be something wrong in my build, I will check it again but we can consider this issue low priority

lunamidori5 commented 10 months ago

If the frontend is using the new Openai api, it will fail, as far as my testing has shown, the thing is, the new api is not live yet... its only on rc0... see #1223 Im working to see if I can find a easy fix

mattapperson commented 9 months ago

This is an issue with streaming vs non. streaming outputs the <|im_end|>. Not sure if this is a LocalAI issue or a llama.cpp issue though