Responses truncated from local model

phpia commented 7 months ago

Hi! I connected a model from localai.io to dialoqbase, but I get the responses from the model truncated in the chat.

Thanks!

[App] provider local
[App] modelName mistral
[App] using local
[App] provider local
[App] modelName mistral
[App] using local
[App] cloycnl250001jlmacshzz1gr
[App] Failed to calculate number of tokens, falling back to approximate count Error: Unknown model
[App] at getEncodingNameForModel (/home/dpardo/cityrobot/server/node_modules/js-tiktoken/dist/lite.cjs:217:13)
[App] at encodingForModel (/home/dpardo/cityrobot/server/node_modules/langchain/dist/util/tiktoken.cjs:24:59) [App] at ChatOpenAI.getNumTokens (/home/dpardo/cityrobot/server/node_modules/langchain/dist/base_language/index.cjs:116:75) [App] at /home/dpardo/cityrobot/server/node_modules/langchain/dist/chat_models/openai.cjs:582:42
[App] at Array.map ()
[App] at ChatOpenAI.getNumTokensFromMessages (/home/dpardo/cityrobot/server/node_modules/langchain/dist/chat_models/openai.cjs:581:60) [App] at ChatOpenAI.getEstimatedTokenCountFromPrompt (/home/dpardo/cityrobot/server/node_modules/langchain/dist/chat_models/openai.cjs:529:34) [App] at ChatOpenAI._generate (/home/dpardo/cityrobot/server/node_modules/langchain/dist/chat_models/openai.cjs:477:49) [App] at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
[App] at async Promise.allSettled (index 0)
[App] Failed to calculate number of tokens, falling back to approximate count Error: Unknown model
[App] at getEncodingNameForModel (/home/dpardo/cityrobot/server/node_modules/js-tiktoken/dist/lite.cjs:217:13) [App] at encodingForModel (/home/dpardo/cityrobot/server/node_modules/langchain/dist/util/tiktoken.cjs:24:59) [App] at ChatOpenAI.getNumTokens (/home/dpardo/cityrobot/server/node_modules/langchain/dist/base_language/index.cjs:116:75) [App] at /home/dpardo/cityrobot/server/node_modules/langchain/dist/chat_models/openai.cjs:582:42 [App] at Array.map ()
[App] at ChatOpenAI.getNumTokensFromMessages (/home/dpardo/cityrobot/server/node_modules/langchain/dist/chat_models/openai.cjs:581:60) [App] at ChatOpenAI.getEstimatedTokenCountFromPrompt (/home/dpardo/cityrobot/server/node_modules/langchain/dist/chat_models/openai.cjs:529:34) [App] at ChatOpenAI._generate (/home/dpardo/cityrobot/server/node_modules/langchain/dist/chat_models/openai.cjs:477:49) [App] at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
[App] at async Promise.allSettled (index 0)

n4ze3m commented 7 months ago

Hey, I think you need to increase the model context_size from the localai.io env. The default one is around 512, and something needs to increase to get a better response :/

https://localai.io/advanced/index.html

n4ze3m commented 7 months ago

If you try without any data source, it will respond without truncating since the default prompt size is small. However, using a data source, you need to increase it to 4K or something similar (but this will use more memory).

phpia commented 7 months ago

I have already test the context. And without any data source. I get a complete answer by curl:

curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
     "model": "mistral",
     "messages": [{"role": "user", "content": "tell me three animals with two horns"}],
     "temperature": 0.9,"stream":false 
   }'
{"created":1699986736,"object":"chat.completion","id":"f1c17b19-1f7f-43d9-bc0c-0e372f02bb31","model":"mistral","choices":[{"index":0,"finish_reason":"stop","message":{"role":"assistant","content":"Three animals with two horns are:\n\n1. Bighorn Sheep: These majestic animals are native to North America and are known for their impressive curved horns.\n\n2. Addax: This unique animal is native to the Sahara Desert and has two straight horns that are used for digging for water and scraping off vegetation.\n\n3. Scimitar-Horned Oryx: This African antelope has two long, slender horns that are curved like a scimitar sword.\n\n"}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}

But truncated in dialoqbase, may be a bug in localAI imagen

n4ze3m commented 7 months ago

okk I will look into the issue

phpia commented 7 months ago

Hey the bug not happens if streaming its not set. So probably related to the langchain library not a dialoqbase problem.

An odd behavior of dialoqbase its that set streaming to "on" each time that you edit bot settings, although was desactivated

May I ask you a questions?

Have sense RAG for local models?

Thanks for your work

n4ze3m commented 7 months ago

Hey the bug not happens if streaming its not set. So probably related to the langchain library not a dialoqbase problem.

I need to update my code to the latest LangChain version; I will do that.

Have sense RAG for local models?

Everything works the same, though. Maybe, I might bring back the old chain, but its performance is really poor.

n4ze3m commented 7 months ago

Hey, I have released an update which may fix this streaming issue on the LocalAI model.

Also could you also provide me with the Hugging Face URL of the local Mistral model you tried? :)

n4ze3m commented 6 months ago

I'm closing this. If you still have the error, please reopen it. Thanks.

n4ze3m / dialoqbase

Responses truncated from local model #147