Closed BarfingLemurs closed 5 months ago
I think there may be an issue with the local model service you're using. I'm confident that ChatGPTBox has passed history messages through the API. Perhaps you can check if there are any log files.
I have been trying some other local apis to see if if there is a problem with the specific backend.
For the openai compatible api in this project: koboldcpp, I can see that the previous messages aren't being sent.
Embedded Kobold Lite loaded.
Starting Kobold API on port 5001 at http://localhost:5001/api/
Starting OpenAI Compatible API on port 5001 at http://localhost:5001/v1/
======
Please connect to custom endpoint at http://localhost:5001
Input: {"messages": [{"role": "user", "content": "hi"}], "model": "", "stream": true, "max_tokens": 40000, "temperature": 1}
Processing Prompt (1 / 1 tokens)
Generating (40 / 2047 tokens)
(EOS token triggered!)
ContextLimit: 41/2048, Processing:0.83s (832.0ms/T), Generation:30.47s (761.8ms/T), Total:31.30s (782.5ms/T = 1.28T/s)
Output: Question: What is 2704537848 to the power of 1/2, to the nearest integer?
Answer: 51931
Input: {"messages": [{"role": "user", "content": "repeat that."}], "model": "", "stream": true, "max_tokens": 40000, "temperature": 1}
Processing Prompt (0 / 0 tokens)
Generating (1 / 2047 tokens)
(EOS token triggered!)
ContextLimit: 2/2048, Processing:0.00s (0.0ms/T), Generation:0.00s (0.0ms/T), Total:0.00s (0.0ms/T = infT/s)
Output:
Here's a normal log of what should happen:
Input: {"n": 1, "max_context_length": 1600, "max_length": 120, "rep_pen": 1.1, "temperature": 0.7, "top_p": 0.92, "top_k": 100, "top_a": 0, "typical": 1, "tfs": 1, "rep_pen_range": 320, "rep_pen_slope": 0.7, "sampler_order": [6, 0, 1, 3, 4, 2, 5], "memory": "", "genkey": "KCPP6614", "min_p": 0, "dynatemp_range": 0, "dynatemp_exponent": 1, "presence_penalty": 0, "logit_bias": {}, "prompt": "\n### Instruction:\nmy favorite sport is soccer. repeat that.\n### Response:\n", "quiet": true, "stop_sequence": ["### Instruction:", "### Response:"], "use_default_badwordsids": false}
Processing Prompt (22 / 22 tokens)
Generating (7 / 120 tokens)
(EOS token triggered!)
ContextLimit: 29/1600, Processing:16.62s (755.5ms/T), Generation:4.75s (679.3ms/T), Total:21.38s (3053.6ms/T = 0.33T/s)
Output: Your favorite sport is soccer.
Input: {"n": 1, "max_context_length": 1600, "max_length": 120, "rep_pen": 1.1, "temperature": 0.7, "top_p": 0.92, "top_k": 100, "top_a": 0, "typical": 1, "tfs": 1, "rep_pen_range": 320, "rep_pen_slope": 0.7, "sampler_order": [6, 0, 1, 3, 4, 2, 5], "memory": "", "genkey": "KCPP5896", "min_p": 0, "dynatemp_range": 0, "dynatemp_exponent": 1, "presence_penalty": 0, "logit_bias": {}, "prompt": "\n### Instruction:\nmy favorite sport is soccer. repeat that.\n### Response:\nYour favorite sport is soccer.\n### Instruction:\nrepeat that.\n### Response:\n", "quiet": true, "stop_sequence": ["### Instruction:", "### Response:"], "use_default_badwordsids": false}
Processing Prompt (14 / 14 tokens)
Generating (7 / 120 tokens)
(EOS token triggered!)
ContextLimit: 49/1600, Processing:10.54s (752.9ms/T), Generation:4.80s (686.3ms/T), Total:15.35s (2192.1ms/T = 0.46T/s)
Output: Your favorite sport is soccer.
Input: {"n": 1, "max_context_length": 1600, "max_length": 120, "rep_pen": 1.1, "temperature": 0.7, "top_p": 0.92, "top_k": 100, "top_a": 0, "typical": 1, "tfs": 1, "rep_pen_range": 320, "rep_pen_slope": 0.7, "sampler_order": [6, 0, 1, 3, 4, 2, 5], "memory": "", "genkey": "KCPP9206", "min_p": 0, "dynatemp_range": 0, "dynatemp_exponent": 1, "presence_penalty": 0, "logit_bias": {}, "prompt": "\n### Instruction:\nmy favorite sport is soccer. repeat that.\n### Response:\nYour favorite sport is soccer.\n### Instruction:\nrepeat that.\n### Response:\nYour favorite sport is soccer.\n### Instruction:\ngood. say it once more.\n### Response:\n", "quiet": true, "stop_sequence": ["### Instruction:", "### Response:"], "use_default_badwordsids": false}
Processing Prompt (18 / 18 tokens)
Generating (7 / 120 tokens)
(EOS token triggered!)
ContextLimit: 73/1600, Processing:13.57s (754.2ms/T), Generation:4.79s (684.4ms/T), Total:18.37s (2623.7ms/T = 0.38T/s)
Output: Your favorite sport is soccer.
Here's my logs with the llama.cpp server binary:
are you able to reproduce it with any of the llama.cpp / ollama backends? am I using the wrong api url?
Did you change your settings? It may be your Max Conversation Length being set to zero.
my settings seem ok, here is my video footage of the issue: https://github.com/josStorer/chatGPTBox/assets/128182951/82639627-48e1-4e48-a064-a98eba96a3a0
is it chrome or some other operating system issue? I was actually able to use the extension on an android phone, with the firefox browser. the auto queries it makes with searches work great.
Refresh the conversation page, does the history messages still exist? Press F12 and click the network section, then select the completion request and click on payload, give me a screenshot
Refresh the conversation page, does the history messages still exist?
No, newly created sessions do not persist after refreshing.
then select the completion request and click on payload
I don't know about this, let me know what to do next.
No, newly created sessions do not persist after refreshing.
This is not normal. If an answer is completed normally, the conversation page should save it correctly, and then when you continue the conversation, it will be sent as a history message.
If it disappears after refreshing, it means that this answer has not been considered complete. ChatGPTBox does not store or send failed or interrupted answers as history messages, which is the same situation you encountered.
For me, using ollama answers can be completed and stored correctly.
Thank you, I hadn't tested ollama, but this works properly.
filling the model name eg: "gemma:2b" is a requirement for ollama to work, along with: export OLLAMA_ORIGINS=* on linux
I will check those other apis again.
Some notable things with tabbyapi (and other ones):
https://github.com/josStorer/chatGPTBox/assets/128182951/153974dc-5bb8-4aa2-93ef-536242c2508e
"</" token is actually "</>", and it's rendered as a html element, so not displayed
Thank you, the conversation now is stored properly and works with these local APIs mentioned, such as tabbyAPI!
Describe the bug 问题描述 A clear and concise description of what the bug is.
Any of the chat windows do not support continued conversations for local models. I'm not sure if this is a bug or it has not been implemented. Example:
When using local model apis like https://github.com/theroyallab/tabbyAPI, I was unable to continue a conversation, the model only receives my input as its first.
To Reproduce 如何复现 Steps to reproduce the behavior: Using firefox, enter local url:
Please complete the following information): 请补全以下内容
Additional context 其他 Add any other context about the problem here.