Open dillfrescott opened 1 year ago
Keep in mind I've tried llama-cpp-python[server]
and it works fine with chatbot-ui
I was able to get one or two responses out of it, but its not related to my input at all though...
root@cross-server:~# ./local-ai-avx2-Linux-x86_64 --f16 --threads 6 --context-size 2048 --debug
Starting LocalAI using 6 threads, with models path: /root/models
unexpected end of JSON input
┌───────────────────────────────────────────────────┐
│ Fiber v2.47.0 │
│ http://127.0.0.1:8080 │
│ (bound on host 0.0.0.0 and port 8080) │
│ │
│ Handlers ............ 32 Processes ........... 1 │
│ Prefork ....... Disabled PID ............ 103304 │
└───────────────────────────────────────────────────┘
[127.0.0.1]:34910 200 - GET /v1/models
[127.0.0.1]:59028 200 - GET /v1/models
[127.0.0.1]:59034 200 - GET /v1/models
4:15PM DBG Request received: {"model":"gpt-3.5-turbo","file":"","language":"","response_format":"","size":"","prompt":null,"instruction":"","input":null,"stop":null,"messages":[{"role":"system","content":"You are Vicuna, a large language model trained by LMSys. Follow the user's instructions carefully. Respond using markdown."},{"role":"user","content":"hey"}],"stream":true,"echo":false,"top_p":0,"top_k":0,"temperature":0.5,"max_tokens":1000,"n":0,"batch":0,"f16":false,"ignore_eos":false,"repeat_penalty":0,"n_keep":0,"mirostat_eta":0,"mirostat_tau":0,"mirostat":0,"frequency_penalty":0,"tfz":0,"seed":0,"mode":0,"step":0,"typical_p":0}
4:15PM DBG Parameter Config: &{OpenAIRequest:{Model:gpt-3.5-turbo File: Language: ResponseFormat: Size: Prompt:<nil> Instruction: Input:<nil> Stop:<nil> Messages:[] Stream:false Echo:false TopP:0.7 TopK:80 Temperature:0.5 Maxtokens:1000 N:0 Batch:0 F16:false IgnoreEOS:false RepeatPenalty:0 Keep:0 MirostatETA:0 MirostatTAU:0 Mirostat:0 FrequencyPenalty:0 TFZ:0 Seed:0 Mode:0 Step:0 TypicalP:0} Name: StopWords:[] Cutstrings:[] TrimSpace:[] ContextSize:2048 F16:true NUMA:false Threads:6 Debug:true Roles:map[] Embeddings:false Backend: TemplateConfig:{Completion: Chat: Edit:} MirostatETA:0 MirostatTAU:0 Mirostat:0 NGPULayers:0 MMap:false MMlock:false LowVRAM:false TensorSplit: MainGPU: ImageGenerationAssets: PromptCachePath: PromptCacheAll:false PromptCacheRO:false PromptStrings:[] InputStrings:[] InputToken:[]}
4:15PM DBG Stream request received
[127.0.0.1]:60428 200 - POST /v1/chat/completions
4:15PM DBG Loading model 'gpt-3.5-turbo' greedly
4:15PM DBG [llama] Attempting to load
4:15PM DBG Sending chunk: {"object":"chat.completion.chunk","model":"gpt-3.5-turbo","choices":[{"delta":{"role":"assistant"}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}
4:15PM DBG Loading model llama from gpt-3.5-turbo
4:15PM DBG Loading model in memory from file: /root/models/gpt-3.5-turbo
llama.cpp: loading model from /root/models/gpt-3.5-turbo
llama_model_load_internal: format = ggjt v3 (latest)
llama_model_load_internal: n_vocab = 32000
llama_model_load_internal: n_ctx = 2048
llama_model_load_internal: n_embd = 4096
llama_model_load_internal: n_mult = 256
llama_model_load_internal: n_head = 32
llama_model_load_internal: n_layer = 32
llama_model_load_internal: n_rot = 128
llama_model_load_internal: ftype = 3 (mostly Q4_1)
llama_model_load_internal: n_ff = 11008
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size = 4017.34 MB
llama_model_load_internal: mem required = 5809.34 MB (+ 1026.00 MB per state)
llama_new_context_with_model: kv self size = 1024.00 MB
4:16PM DBG [llama] Loads OK
4:16PM DBG Sending goroutine: ,
4:16PM DBG Sending chunk: {"object":"chat.completion.chunk","model":"gpt-3.5-turbo","choices":[{"delta":{"content":","}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}
4:16PM DBG Sending goroutine: can
4:16PM DBG Sending chunk: {"object":"chat.completion.chunk","model":"gpt-3.5-turbo","choices":[{"delta":{"content":" can"}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}
4:16PM DBG Sending goroutine: you
4:16PM DBG Sending chunk: {"object":"chat.completion.chunk","model":"gpt-3.5-turbo","choices":[{"delta":{"content":" you"}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}
4:16PM DBG Sending goroutine: give
4:16PM DBG Sending chunk: {"object":"chat.completion.chunk","model":"gpt-3.5-turbo","choices":[{"delta":{"content":" give"}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}
4:16PM DBG Sending goroutine: me
4:16PM DBG Sending chunk: {"object":"chat.completion.chunk","model":"gpt-3.5-turbo","choices":[{"delta":{"content":" me"}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}
4:16PM DBG Sending goroutine: an
4:16PM DBG Sending chunk: {"object":"chat.completion.chunk","model":"gpt-3.5-turbo","choices":[{"delta":{"content":" an"}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}
4:16PM DBG Sending goroutine: over
4:16PM DBG Sending chunk: {"object":"chat.completion.chunk","model":"gpt-3.5-turbo","choices":[{"delta":{"content":" over"}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}
4:16PM DBG Sending goroutine: view
4:16PM DBG Sending chunk: {"object":"chat.completion.chunk","model":"gpt-3.5-turbo","choices":[{"delta":{"content":"view"}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}
4:16PM DBG Sending goroutine: of
4:16PM DBG Sending chunk: {"object":"chat.completion.chunk","model":"gpt-3.5-turbo","choices":[{"delta":{"content":" of"}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}
4:16PM DBG Sending goroutine: the
4:16PM DBG Sending chunk: {"object":"chat.completion.chunk","model":"gpt-3.5-turbo","choices":[{"delta":{"content":" the"}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}
4:16PM DBG Sending goroutine: current
4:16PM DBG Sending chunk: {"object":"chat.completion.chunk","model":"gpt-3.5-turbo","choices":[{"delta":{"content":" current"}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}
4:16PM DBG Sending goroutine: state
4:16PM DBG Sending chunk: {"object":"chat.completion.chunk","model":"gpt-3.5-turbo","choices":[{"delta":{"content":" state"}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}
4:16PM DBG Sending goroutine: of
4:16PM DBG Sending chunk: {"object":"chat.completion.chunk","model":"gpt-3.5-turbo","choices":[{"delta":{"content":" of"}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}
4:16PM DBG Sending goroutine: the
4:16PM DBG Sending chunk: {"object":"chat.completion.chunk","model":"gpt-3.5-turbo","choices":[{"delta":{"content":" the"}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}
4:16PM DBG Sending goroutine: world
4:16PM DBG Sending chunk: {"object":"chat.completion.chunk","model":"gpt-3.5-turbo","choices":[{"delta":{"content":" world"}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}
4:16PM DBG Sending goroutine: based
4:16PM DBG Sending chunk: {"object":"chat.completion.chunk","model":"gpt-3.5-turbo","choices":[{"delta":{"content":" based"}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}
4:16PM DBG Sending goroutine: on
4:16PM DBG Sending chunk: {"object":"chat.completion.chunk","model":"gpt-3.5-turbo","choices":[{"delta":{"content":" on"}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}
4:16PM DBG Sending goroutine: your
4:16PM DBG Sending chunk: {"object":"chat.completion.chunk","model":"gpt-3.5-turbo","choices":[{"delta":{"content":" your"}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}
4:16PM DBG Sending goroutine: training
4:16PM DBG Sending chunk: {"object":"chat.completion.chunk","model":"gpt-3.5-turbo","choices":[{"delta":{"content":" training"}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}
4:16PM DBG Sending goroutine: data
4:16PM DBG Sending chunk: {"object":"chat.completion.chunk","model":"gpt-3.5-turbo","choices":[{"delta":{"content":" data"}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}
4:16PM DBG Sending goroutine: ?
4:16PM DBG Sending chunk: {"object":"chat.completion.chunk","model":"gpt-3.5-turbo","choices":[{"delta":{"content":"?"}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}
4:16PM DBG Sending goroutine:
llama_print_timings: load time = 50512.54 ms
llama_print_timings: sample time = 13.53 ms / 22 runs ( 0.62 ms per token, 1625.66 tokens per second)
llama_print_timings: prompt eval time = 5309.82 ms / 33 tokens ( 160.90 ms per token, 6.21 tokens per second)
llama_print_timings: eval time = 5150.71 ms / 21 runs ( 245.27 ms per token, 4.08 tokens per second)
llama_print_timings: total time = 10696.24 ms
4:16PM DBG Sending chunk: {"object":"chat.completion.chunk","model":"gpt-3.5-turbo","choices":[{"delta":{}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}
[127.0.0.1]:39894 200 - GET /v1/models
4:17PM DBG Request received: {"model":"gpt-3.5-turbo","file":"","language":"","response_format":"","size":"","prompt":null,"instruction":"","input":null,"stop":null,"messages":[{"role":"system","content":"You are Vicuna, a large language model trained by LMSys. Follow the user's instructions carefully. Respond using markdown."},{"role":"user","content":"hey"}],"stream":true,"echo":false,"top_p":0,"top_k":0,"temperature":0.5,"max_tokens":1000,"n":0,"batch":0,"f16":false,"ignore_eos":false,"repeat_penalty":0,"n_keep":0,"mirostat_eta":0,"mirostat_tau":0,"mirostat":0,"frequency_penalty":0,"tfz":0,"seed":0,"mode":0,"step":0,"typical_p":0}
4:17PM DBG Parameter Config: &{OpenAIRequest:{Model:gpt-3.5-turbo File: Language: ResponseFormat: Size: Prompt:<nil> Instruction: Input:<nil> Stop:<nil> Messages:[] Stream:false Echo:false TopP:0.7 TopK:80 Temperature:0.5 Maxtokens:1000 N:0 Batch:0 F16:false IgnoreEOS:false RepeatPenalty:0 Keep:0 MirostatETA:0 MirostatTAU:0 Mirostat:0 FrequencyPenalty:0 TFZ:0 Seed:0 Mode:0 Step:0 TypicalP:0} Name: StopWords:[] Cutstrings:[] TrimSpace:[] ContextSize:2048 F16:true NUMA:false Threads:6 Debug:true Roles:map[] Embeddings:false Backend: TemplateConfig:{Completion: Chat: Edit:} MirostatETA:0 MirostatTAU:0 Mirostat:0 NGPULayers:0 MMap:false MMlock:false LowVRAM:false TensorSplit: MainGPU: ImageGenerationAssets: PromptCachePath: PromptCacheAll:false PromptCacheRO:false PromptStrings:[] InputStrings:[] InputToken:[]}
4:17PM DBG Stream request received
[127.0.0.1]:59094 200 - POST /v1/chat/completions
4:17PM DBG Loading model 'gpt-3.5-turbo' greedly
4:17PM DBG Model 'gpt-3.5-turbo' already loaded
4:17PM DBG Sending chunk: {"object":"chat.completion.chunk","model":"gpt-3.5-turbo","choices":[{"delta":{"role":"assistant"}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}
4:17PM DBG Sending goroutine: ,
4:17PM DBG Sending chunk: {"object":"chat.completion.chunk","model":"gpt-3.5-turbo","choices":[{"delta":{"content":","}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}
4:17PM DBG Sending goroutine: can
4:17PM DBG Sending chunk: {"object":"chat.completion.chunk","model":"gpt-3.5-turbo","choices":[{"delta":{"content":" can"}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}
4:17PM DBG Sending goroutine: you
4:17PM DBG Sending chunk: {"object":"chat.completion.chunk","model":"gpt-3.5-turbo","choices":[{"delta":{"content":" you"}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}
4:17PM DBG Sending goroutine: give
4:17PM DBG Sending chunk: {"object":"chat.completion.chunk","model":"gpt-3.5-turbo","choices":[{"delta":{"content":" give"}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}
4:17PM DBG Sending goroutine: me
4:17PM DBG Sending chunk: {"object":"chat.completion.chunk","model":"gpt-3.5-turbo","choices":[{"delta":{"content":" me"}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}
4:17PM DBG Sending goroutine: an
4:17PM DBG Sending chunk: {"object":"chat.completion.chunk","model":"gpt-3.5-turbo","choices":[{"delta":{"content":" an"}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}
4:17PM DBG Sending goroutine: over
4:17PM DBG Sending chunk: {"object":"chat.completion.chunk","model":"gpt-3.5-turbo","choices":[{"delta":{"content":" over"}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}
4:17PM DBG Sending goroutine: view
4:17PM DBG Sending chunk: {"object":"chat.completion.chunk","model":"gpt-3.5-turbo","choices":[{"delta":{"content":"view"}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}
4:17PM DBG Sending goroutine: of
4:17PM DBG Sending chunk: {"object":"chat.completion.chunk","model":"gpt-3.5-turbo","choices":[{"delta":{"content":" of"}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}
4:17PM DBG Sending goroutine: the
4:17PM DBG Sending chunk: {"object":"chat.completion.chunk","model":"gpt-3.5-turbo","choices":[{"delta":{"content":" the"}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}
4:17PM DBG Sending goroutine: current
4:17PM DBG Sending chunk: {"object":"chat.completion.chunk","model":"gpt-3.5-turbo","choices":[{"delta":{"content":" current"}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}
4:17PM DBG Sending goroutine: state
4:17PM DBG Sending chunk: {"object":"chat.completion.chunk","model":"gpt-3.5-turbo","choices":[{"delta":{"content":" state"}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}
4:17PM DBG Sending goroutine: of
4:17PM DBG Sending chunk: {"object":"chat.completion.chunk","model":"gpt-3.5-turbo","choices":[{"delta":{"content":" of"}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}
4:17PM DBG Sending goroutine: the
4:17PM DBG Sending chunk: {"object":"chat.completion.chunk","model":"gpt-3.5-turbo","choices":[{"delta":{"content":" the"}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}
4:17PM DBG Sending goroutine: world
4:17PM DBG Sending chunk: {"object":"chat.completion.chunk","model":"gpt-3.5-turbo","choices":[{"delta":{"content":" world"}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}
4:17PM DBG Sending goroutine: based
4:17PM DBG Sending chunk: {"object":"chat.completion.chunk","model":"gpt-3.5-turbo","choices":[{"delta":{"content":" based"}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}
4:17PM DBG Sending goroutine: on
4:17PM DBG Sending chunk: {"object":"chat.completion.chunk","model":"gpt-3.5-turbo","choices":[{"delta":{"content":" on"}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}
4:17PM DBG Sending goroutine: your
4:17PM DBG Sending chunk: {"object":"chat.completion.chunk","model":"gpt-3.5-turbo","choices":[{"delta":{"content":" your"}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}
4:17PM DBG Sending goroutine: training
4:17PM DBG Sending chunk: {"object":"chat.completion.chunk","model":"gpt-3.5-turbo","choices":[{"delta":{"content":" training"}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}
4:17PM DBG Sending goroutine: data
4:17PM DBG Sending chunk: {"object":"chat.completion.chunk","model":"gpt-3.5-turbo","choices":[{"delta":{"content":" data"}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}
4:17PM DBG Sending goroutine: ?
4:17PM DBG Sending chunk: {"object":"chat.completion.chunk","model":"gpt-3.5-turbo","choices":[{"delta":{"content":"?"}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}
4:17PM DBG Sending goroutine:
llama_print_timings: load time = 50512.54 ms
llama_print_timings: sample time = 13.43 ms / 22 runs ( 0.61 ms per token, 1638.25 tokens per second)
llama_print_timings: prompt eval time = 5198.24 ms / 33 tokens ( 157.52 ms per token, 6.35 tokens per second)
llama_print_timings: eval time = 4881.24 ms / 21 runs ( 232.44 ms per token, 4.30 tokens per second)
llama_print_timings: total time = 10135.74 ms
4:17PM DBG Sending chunk: {"object":"chat.completion.chunk","model":"gpt-3.5-turbo","choices":[{"delta":{}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}
4:17PM DBG Request received: {"model":"gpt-3.5-turbo","file":"","language":"","response_format":"","size":"","prompt":null,"instruction":"","input":null,"stop":null,"messages":[{"role":"system","content":"You are Vicuna, a large language model trained by LMSys. Follow the user's instructions carefully. Respond using markdown."},{"role":"user","content":"what is 9 plus 12?"}],"stream":true,"echo":false,"top_p":0,"top_k":0,"temperature":0.5,"max_tokens":1000,"n":0,"batch":0,"f16":false,"ignore_eos":false,"repeat_penalty":0,"n_keep":0,"mirostat_eta":0,"mirostat_tau":0,"mirostat":0,"frequency_penalty":0,"tfz":0,"seed":0,"mode":0,"step":0,"typical_p":0}
4:17PM DBG Parameter Config: &{OpenAIRequest:{Model:gpt-3.5-turbo File: Language: ResponseFormat: Size: Prompt:<nil> Instruction: Input:<nil> Stop:<nil> Messages:[] Stream:false Echo:false TopP:0.7 TopK:80 Temperature:0.5 Maxtokens:1000 N:0 Batch:0 F16:false IgnoreEOS:false RepeatPenalty:0 Keep:0 MirostatETA:0 MirostatTAU:0 Mirostat:0 FrequencyPenalty:0 TFZ:0 Seed:0 Mode:0 Step:0 TypicalP:0} Name: StopWords:[] Cutstrings:[] TrimSpace:[] ContextSize:2048 F16:true NUMA:false Threads:6 Debug:true Roles:map[] Embeddings:false Backend: TemplateConfig:{Completion: Chat: Edit:} MirostatETA:0 MirostatTAU:0 Mirostat:0 NGPULayers:0 MMap:false MMlock:false LowVRAM:false TensorSplit: MainGPU: ImageGenerationAssets: PromptCachePath: PromptCacheAll:false PromptCacheRO:false PromptStrings:[] InputStrings:[] InputToken:[]}
4:17PM DBG Stream request received
[127.0.0.1]:52898 200 - POST /v1/chat/completions
4:17PM DBG Loading model 'gpt-3.5-turbo' greedly
4:17PM DBG Model 'gpt-3.5-turbo' already loaded
4:17PM DBG Sending chunk: {"object":"chat.completion.chunk","model":"gpt-3.5-turbo","choices":[{"delta":{"role":"assistant"}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}
4:17PM DBG Sending goroutine:
llama_print_timings: load time = 50512.54 ms
llama_print_timings: sample time = 0.61 ms / 1 runs ( 0.61 ms per token, 1626.02 tokens per second)
llama_print_timings: prompt eval time = 6376.47 ms / 40 tokens ( 159.41 ms per token, 6.27 tokens per second)
llama_print_timings: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
llama_print_timings: total time = 6377.61 ms
4:17PM DBG Sending chunk: {"object":"chat.completion.chunk","model":"gpt-3.5-turbo","choices":[{"delta":{}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}
4:17PM DBG Request received: {"model":"gpt-3.5-turbo","file":"","language":"","response_format":"","size":"","prompt":null,"instruction":"","input":null,"stop":null,"messages":[{"role":"system","content":"You are Vicuna, a large language model trained by LMSys. Follow the user's instructions carefully. Respond using markdown."},{"role":"user","content":"what is 9 plus 12?"}],"stream":true,"echo":false,"top_p":0,"top_k":0,"temperature":0.5,"max_tokens":1000,"n":0,"batch":0,"f16":false,"ignore_eos":false,"repeat_penalty":0,"n_keep":0,"mirostat_eta":0,"mirostat_tau":0,"mirostat":0,"frequency_penalty":0,"tfz":0,"seed":0,"mode":0,"step":0,"typical_p":0}
4:17PM DBG Parameter Config: &{OpenAIRequest:{Model:gpt-3.5-turbo File: Language: ResponseFormat: Size: Prompt:<nil> Instruction: Input:<nil> Stop:<nil> Messages:[] Stream:false Echo:false TopP:0.7 TopK:80 Temperature:0.5 Maxtokens:1000 N:0 Batch:0 F16:false IgnoreEOS:false RepeatPenalty:0 Keep:0 MirostatETA:0 MirostatTAU:0 Mirostat:0 FrequencyPenalty:0 TFZ:0 Seed:0 Mode:0 Step:0 TypicalP:0} Name: StopWords:[] Cutstrings:[] TrimSpace:[] ContextSize:2048 F16:true NUMA:false Threads:6 Debug:true Roles:map[] Embeddings:false Backend: TemplateConfig:{Completion: Chat: Edit:} MirostatETA:0 MirostatTAU:0 Mirostat:0 NGPULayers:0 MMap:false MMlock:false LowVRAM:false TensorSplit: MainGPU: ImageGenerationAssets: PromptCachePath: PromptCacheAll:false PromptCacheRO:false PromptStrings:[] InputStrings:[] InputToken:[]}
4:17PM DBG Stream request received
[127.0.0.1]:34764 200 - POST /v1/chat/completions
4:17PM DBG Sending chunk: {"object":"chat.completion.chunk","model":"gpt-3.5-turbo","choices":[{"delta":{"role":"assistant"}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}
4:17PM DBG Loading model 'gpt-3.5-turbo' greedly
4:17PM DBG Model 'gpt-3.5-turbo' already loaded
4:17PM DBG Sending goroutine:
llama_print_timings: load time = 50512.54 ms
llama_print_timings: sample time = 0.61 ms / 1 runs ( 0.61 ms per token, 1628.66 tokens per second)
llama_print_timings: prompt eval time = 6357.98 ms / 40 tokens ( 158.95 ms per token, 6.29 tokens per second)
llama_print_timings: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
llama_print_timings: total time = 6359.02 ms
4:17PM DBG Sending chunk: {"object":"chat.completion.chunk","model":"gpt-3.5-turbo","choices":[{"delta":{}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}
Hi! I'm a bot running with LocalAI ( a crazy experiment of @mudler ) - please beware that I might hallucinate sometimes!
_but.... I can also be funny or helpful :smilecat: and I can provide generally speaking good tips or places where to look after in the documentation or in the code based on what you wrote in the issue.
Don't engage in conversation with me, I don't support (yet) replying!
You could try building LocalAI yourself, with the CMAKE_ARGS to disable some instruction sets that your CPU may not support. Additionally, check if the required environment variables are correctly set when running the binary, such as the HDD or other mount points for storing the token streams. You can also check if there are any compatibility issues with your operating system or the CUDA version installed on your machine. If all else fails, try using a different model or seek assistance from the LocalAI community on how to troubleshoot this issue further.
Sources:
I've tried both the docker and binary versions. The model loads into ram (vicuna 7b) and uses the cpu but only outputs a single
!
or no response at all