Chatting with StarChat does not produce an output

mads-oestergaard commented 1 year ago

LocalAI version:

Running container image quay.io/go-skynet/local-ai:latest with the docker-compose.yaml from master (commit 295f3030a9f76f7f36705abb760a2cd351d2b4d8)

Environment, CPU architecture, OS, and Version:

Linux hostname 5.13.0-41-generic #46~20.04.1-Ubuntu SMP Wed Apr 20 13:16:21 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

Describe the bug

With StarChat Beta, a instruct finetuned starcoder, I cannot get responses through the API.

The output of does not appear to be parsed correctly; there is no content in the reply:

user@hostname:~/LocalAI$ curl localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
    "model": "gpt-3.5-turbo", 
    "messages": [ {"role":"user", "content":"how do i sort a list in python?"} ]
    }' | jq
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   264  100   164  100   100     13      8  0:00:12  0:00:11  0:00:01    46
{
  "object": "chat.completion",
  "model": "gpt-3.5-turbo",
  "choices": [
    {
      "message": {
        "role": "assistant"
      }
    }
  ],
  "usage": {
    "prompt_tokens": 0,
    "completion_tokens": 0,
    "total_tokens": 0
  }
}

To Reproduce

Download one of the ggml starchat beta models: $ wget https://huggingface.co/TheBloke/starchat-beta-GGML/resolve/main/starchat-beta.ggmlv3.q4_1.bin -O models/starchat-beta-q4_1.ggmlv3

$ touch models/gpt-3.5-turbo.yaml Insert this config in gpt-3.5-turbo.yaml

name: gpt-3.5-turbo
# Default model parameters
parameters:
  # Relative to the models path
  model: starchat-beta-q4_1.ggmlv3
backend: starcoder
context_size: 8192
threads: 4
debug: true
template:
  chat: chat
stopwords:
- <|end|>

Create a chat.tmpl file and add this to it:

<|system|> Below is a conversation between a human user and a helpful AI coding assistant. <|end|> 
<|user|> {{.Input}} <|end|>
<|assistant|>

Call the

Expected behavior

I would expect the response to contain something like this, generated from the demo on https://huggingface.co/HuggingFaceH4/starchat-beta: There are multiple ways to sort a list in Python. One of the most common ways is to use the sort() method. Here is an example:

my_list = [3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5]
my_list.sort()
print(my_list)

This will sort the list in place and print the sorted list.

Logs

api_1  | Starting LocalAI using 4 threads, with models path: /models
api_1  | 7:23AM DBG Model: gpt-3.5-turbo (config: {OpenAIRequest:{Model:starchat-beta-q4_1.ggmlv3 File: Language: ResponseFormat: Size: Prompt:<nil> Instruction: Input:<nil> Stop:<nil> Messages:[] Stream:false Echo:false TopP:0 TopK:0 Temperature:0 Maxtokens:0 N:0 Batch:0 F16:false IgnoreEOS:false RepeatPenalty:0 Keep:0 MirostatETA:0 MirostatTAU:0 Mirostat:0 FrequencyPenalty:0 TFZ:0 Seed:0 Mode:0 Step:0} Name:gpt-3.5-turbo StopWords:[<|end|>] Cutstrings:[] TrimSpace:[] ContextSize:8192 F16:false Threads:4 Debug:true Roles:map[] Embeddings:false Backend:starcoder TemplateConfig:{Completion: Chat:chat Edit:} MirostatETA:0 MirostatTAU:0 Mirostat:0 NGPULayers:0 MMap:false MMlock:false TensorSplit: MainGPU: ImageGenerationAssets: PromptCachePath: PromptCacheAll:false PromptCacheRO:false PromptStrings:[] InputStrings:[] InputToken:[]})
api_1  | 7:23AM DBG GPT4ALL_IMPLEMENTATIONS_PATH: /tmp/localai/backend_data/backend-assets/gpt4all
api_1  | 
api_1  |  ┌───────────────────────────────────────────────────┐ 
api_1  |  │                   Fiber v2.46.0                   │ 
api_1  |  │               http://127.0.0.1:8080               │ 
api_1  |  │       (bound on host 0.0.0.0 and port 8080)       │ 
api_1  |  │                                                   │ 
api_1  |  │ Handlers ............ 24  Processes ........... 1 │ 
api_1  |  │ Prefork ....... Disabled  PID ............... 209 │ 
api_1  |  └───────────────────────────────────────────────────┘ 
api_1  | 
api_1  | 7:23AM DBG Request received: {"model":"gpt-3.5-turbo","file":"","language":"","response_format":"","size":"","prompt":null,"instruction":"","input":null,"stop":null,"messages":[{"role":"user","content":"how do i sort a list in python?"}],"stream":false,"echo":false,"top_p":0,"top_k":0,"temperature":0,"max_tokens":0,"n":0,"batch":0,"f16":false,"ignore_eos":false,"repeat_penalty":0,"n_keep":0,"mirostat_eta":0,"mirostat_tau":0,"mirostat":0,"frequency_penalty":0,"tfz":0,"seed":0,"mode":0,"step":0}
api_1  | 7:23AM DBG Parameter Config: &{OpenAIRequest:{Model:starchat-beta-q4_1.ggmlv3 File: Language: ResponseFormat: Size: Prompt:<nil> Instruction: Input:<nil> Stop:<nil> Messages:[] Stream:false Echo:false TopP:0 TopK:0 Temperature:0 Maxtokens:0 N:0 Batch:0 F16:false IgnoreEOS:false RepeatPenalty:0 Keep:0 MirostatETA:0 MirostatTAU:0 Mirostat:0 FrequencyPenalty:0 TFZ:0 Seed:0 Mode:0 Step:0} Name:gpt-3.5-turbo StopWords:[<|end|>] Cutstrings:[] TrimSpace:[] ContextSize:8192 F16:false Threads:4 Debug:true Roles:map[] Embeddings:false Backend:starcoder TemplateConfig:{Completion: Chat:chat Edit:} MirostatETA:0 MirostatTAU:0 Mirostat:0 NGPULayers:0 MMap:false MMlock:false TensorSplit: MainGPU: ImageGenerationAssets: PromptCachePath: PromptCacheAll:false PromptCacheRO:false PromptStrings:[] InputStrings:[] InputToken:[]}
api_1  | 7:23AM DBG Template found, input modified to: <|system|> Below is a conversation between a human user and a helpful AI coding assistant. <|end|> 
api_1  | <|user|> how do i sort a list in python? <|end|>
api_1  | <|assistant|>
api_1  | 7:23AM DBG Loading model starcoder from starchat-beta-q4_1.ggmlv3
api_1  | 7:23AM DBG Loading model in memory from file: /models/starchat-beta-q4_1.ggmlv3
api_1  | [172.30.0.1]:38790  200  -  POST     /v1/chat/completions
api_1  | 7:23AM DBG Response: {"object":"chat.completion","model":"gpt-3.5-turbo","choices":[{"message":{"role":"assistant"}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}

Additional context

luoweb commented 1 year ago

Is this issue solved? I repeated the operation, it always show as below?

12:23PM DBG Request received: {"model":"gpt-3.5-turbo","file":"","language":"","response_format":"","size":"","prompt":null,"instruction":"","input":null,"stop":null,"messages":[{"role":"user","content":"how do i sort a list in python?"}],"stream":false,"echo":false,"top_p":0,"top_k":0,"temperature":0,"max_tokens":0,"n":0,"batch":0,"f16":false,"ignore_eos":false,"repeat_penalty":0,"n_keep":0,"mirostat_eta":0,"mirostat_tau":0,"mirostat":0,"frequency_penalty":0,"tfz":0,"seed":0,"mode":0,"step":0,"typical_p":0} 12:23PM DBG Parameter Config: &{OpenAIRequest:{Model:starchat-beta.ggmlv3.q4_0.bin File: Language: ResponseFormat: Size: Prompt: Instruction: Input: Stop: Messages:[] Stream:false Echo:false TopP:0 TopK:0 Temperature:0 Maxtokens:0 N:0 Batch:0 F16:false IgnoreEOS:false RepeatPenalty:0 Keep:0 MirostatETA:0 MirostatTAU:0 Mirostat:0 FrequencyPenalty:0 TFZ:0 Seed:0 Mode:0 Step:0 TypicalP:0} Name:gpt-3.5-turbo StopWords:[<|end|>] Cutstrings:[] TrimSpace:[] ContextSize:8192 F16:false NUMA:false Threads:4 Debug:true Roles:map[] Embeddings:false Backend:starcoder TemplateConfig:{Completion: Chat:chat Edit:} MirostatETA:0 MirostatTAU:0 Mirostat:0 NGPULayers:0 MMap:false MMlock:false LowVRAM:false TensorSplit: MainGPU: ImageGenerationAssets: PromptCachePath: PromptCacheAll:false PromptCacheRO:false PromptStrings:[] InputStrings:[] InputToken:[]} 12:23PM DBG Template found, input modified to: <|system|> Below is a conversation between a human user and a helpful AI coding assistant. <|end|> <|user|> how do i sort a list in python? <|end|> <|assistant|> 12:23PM DBG Loading model starcoder from starchat-beta.ggmlv3.q4_0.bin 12:23PM DBG Loading model in memory from file: /Users/block/code/data/models/starchat-beta.ggmlv3.q4_0.bin starcoder_model_load: loading model from '/Users/block/code/data/models/starchat-beta.ggmlv3.q4_0.bin' starcoder_model_load: n_vocab = 49156 starcoder_model_load: n_ctx = 8192 starcoder_model_load: n_embd = 6144 starcoder_model_load: n_head = 48 starcoder_model_load: n_layer = 40 starcoder_model_load: ftype = 2002 starcoder_model_load: qntvr = 2 starcoder_model_load: ggml ctx size = 25608.50 MB starcoder_model_load: memory size = 15360.00 MB, n_mem = 327680 starcoder_model_load: model size = 10248.26 MB 12:23PM DBG Response: {"object":"chat.completion","model":"gpt-3.5-turbo","choices":[{"message":{"role":"assistant","content":"\u003c|system|\u003e Below is a conversation between a human user and a helpful AI coding assistant. \u003c|end|\u003e"}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}} [127.0.0.1]:62169 200 - POST /v1/chat/completions

mads-oestergaard commented 1 year ago

Yes, I see the same thing. If you modify the template to "<|system|>\n<|end|>\n<|user|>{{.Input}} <|end|>\n<|assistant|>" similar to the example in https://huggingface.co/HuggingFaceH4/starchat-beta, see the code block, then there is still no output:

api_1  | 7:57AM DBG Parameter Config: &{OpenAIRequest:{Model:starchat-beta-q4_1.ggmlv3 File: Language: ResponseFormat: Size: Prompt:<nil> Instruction: Input:<nil> Stop:<nil> Messages:[] Stream:false Echo:false TopP:0 TopK:0 Temperature:0 Maxtokens:0 N:0 Batch:0 F16:false IgnoreEOS:false RepeatPenalty:0 Keep:0 MirostatETA:0 MirostatTAU:0 Mirostat:0 FrequencyPenalty:0 TFZ:0 Seed:0 Mode:0 Step:0 TypicalP:0} Name:starchat StopWords:[] Cutstrings:[] TrimSpace:[] ContextSize:8192 F16:false Threads:4 Debug:true Roles:map[] Embeddings:false Backend:starcoder TemplateConfig:{Completion: Chat:chat Edit:} MirostatETA:0 MirostatTAU:0 Mirostat:0 NGPULayers:0 MMap:false MMlock:false LowVRAM:false TensorSplit: MainGPU: ImageGenerationAssets: PromptCachePath: PromptCacheAll:false PromptCacheRO:false PromptStrings:[] InputStrings:[] InputToken:[]}
api_1  | 7:57AM DBG Template found, input modified to: <|system|>
api_1  | <|end|> 
api_1  | <|user|> how do i sort a list in python? <|end|>
api_1  | <|assistant|>
api_1  | 7:57AM DBG Loading model starcoder from starchat-beta-q4_1.ggmlv3
api_1  | 7:57AM DBG Loading model in memory from file: /models/starchat-beta-q4_1.ggmlv3
api_1  | 7:57AM DBG Response: {"object":"chat.completion","model":"starchat","choices":[{"message":{"role":"assistant"}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}
api_1  | [10.4.80.103]:47196  200  -  POST     /v1/chat/completions

I suspect it has something to do with the stop-token.

luoweb commented 1 year ago

Yes, I see the same thing. If you modify the template to "<|system|>\n<|end|>\n<|user|>{{.Input}} <|end|>\n<|assistant|>" similar to the example in https://huggingface.co/HuggingFaceH4/starchat-beta, see the code block, then there is still no output:

api_1  | 7:57AM DBG Parameter Config: &{OpenAIRequest:{Model:starchat-beta-q4_1.ggmlv3 File: Language: ResponseFormat: Size: Prompt:<nil> Instruction: Input:<nil> Stop:<nil> Messages:[] Stream:false Echo:false TopP:0 TopK:0 Temperature:0 Maxtokens:0 N:0 Batch:0 F16:false IgnoreEOS:false RepeatPenalty:0 Keep:0 MirostatETA:0 MirostatTAU:0 Mirostat:0 FrequencyPenalty:0 TFZ:0 Seed:0 Mode:0 Step:0 TypicalP:0} Name:starchat StopWords:[] Cutstrings:[] TrimSpace:[] ContextSize:8192 F16:false Threads:4 Debug:true Roles:map[] Embeddings:false Backend:starcoder TemplateConfig:{Completion: Chat:chat Edit:} MirostatETA:0 MirostatTAU:0 Mirostat:0 NGPULayers:0 MMap:false MMlock:false LowVRAM:false TensorSplit: MainGPU: ImageGenerationAssets: PromptCachePath: PromptCacheAll:false PromptCacheRO:false PromptStrings:[] InputStrings:[] InputToken:[]}
api_1  | 7:57AM DBG Template found, input modified to: <|system|>
api_1  | <|end|> 
api_1  | <|user|> how do i sort a list in python? <|end|>
api_1  | <|assistant|>
api_1  | 7:57AM DBG Loading model starcoder from starchat-beta-q4_1.ggmlv3
api_1  | 7:57AM DBG Loading model in memory from file: /models/starchat-beta-q4_1.ggmlv3
api_1  | 7:57AM DBG Response: {"object":"chat.completion","model":"starchat","choices":[{"message":{"role":"assistant"}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}
api_1  | [10.4.80.103]:47196  200  -  POST     /v1/chat/completions

I suspect it has something to do with the stop-token.

I was succeeded with the following prompt tmpl:

{{.Input}} <|end|>
<|assistant|>

localai-bot commented 1 year ago

:warning::warning::warning::warning::warning:

Hi! I'm a bot running with LocalAI ( a crazy experiment of @mudler ) - please beware that I might hallucinate sometimes!

_but.... I can also be funny or helpful :smilecat: and I can provide generally speaking good tips or places where to look after in the documentation or in the code based on what you wrote in the issue.

Don't engage in conversation with me, I don't support (yet) replying!

:warning::warning::warning::warning::warning:

Based on the information provided, it seems like there is a problem with the GPT-3 model not being able to sort a list in Python. Here are some possible solutions or debugging steps that can be taken:

Check if the GPT-3 model has been trained on the specific task of sorting lists in Python. If not, try training it on this task using a dataset of sorted and unsorted lists in Python.
Check if there is enough memory available for the model to run efficiently. If not, try increasing the amount of memory allocated to the model or reducing the size of the input data.
Check if the GPT-3 model is able to generate responses correctly by testing it on a simpler task, such as generating a response to a prompt that does not involve sorting lists in Python.
Check if there are any errors or warnings in the logs related to the GPT-3 model or the LocalAI open source project. If so, investigate and resolve these issues before attempting to use the model again.
If none of the above solutions work, try restarting the LocalAI open source project and the GPT-3 model to see if this resolves the issue. Overall, it seems like there may be a problem with the GPT-3 model not being able to generate responses related to sorting lists in Python. Further debugging and training may be required to resolve this issue. I hope this helps! Let me know if you have any further questions or need more assistance.

Sources:

Aisuko commented 1 year ago

It looks like the issue can be fixed by using the prompt template as @luoweb mentioned above. I am going to close this issue now. Please free to open it if the issue still occurs.

mudler / LocalAI

Chatting with StarChat does not produce an output #605

:warning::warning::warning::warning::warning:

:warning::warning::warning::warning::warning: