Open Janaka-Steph opened 11 months ago
To replicate the regression bug (and maybe is time to have an end-to-end test to run automatically)
http://localhost:8447
curl --location 'http://localhost:8447/v1/chat/completions' \
--header 'Content-Type: application/json' \
--data '{
"model": "mistral-7b-instruct-v0.1.Q5_0.gguf",
"messages": [
{
"role": "user",
"content": "explain Bitcoin like I am 5"
}
],
"stream": true,
"temperature": 0.2,
"max_tokens": 256,
"top_p": 0.95,
"frequency_penalty": 0,
"n": 1,
"presence_penalty": 0
}'
curl --location 'http://localhost:8447/v1/chat/completions' \
--header 'Content-Type: application/json' \
--data '{
"model": "mistral-7b-instruct-v0.1.Q5_0.gguf",
"messages": [
{
"role": "user",
"content": "do it with emoji"
}
],
"stream": true,
"temperature": 0.2,
"max_tokens": 256,
"top_p": 0.95,
"frequency_penalty": 0,
"n": 1,
"presence_penalty": 0
}'
Response
event: completion
data: {"id": "chatcmpl-d8676dd6-9320-4eb1-ae97-0ef8ad6f7754", "model": "mistral-7b-instruct-v0.1.Q5_0.gguf", "created": 1700658362, "object": "chat.completion.chunk", "choices": [{"index": 0, "delta": {"role": "assistant"}, "finish_reason": null}]}
event: completion
data: {"id": "chatcmpl-d8676dd6-9320-4eb1-ae97-0ef8ad6f7754", "model": "mistral-7b-instruct-v0.1.Q5_0.gguf", "created": 1700658362, "object": "chat.completion.chunk", "choices": [{"index": 0, "delta": {}, "finish_reason": "stop"}]}
event: done
data: [DONE]
on second call I got this response:
event: completion
data: {"id": "chatcmpl-6436f7e3-6023-460c-9c3e-c1bfb70efd86", "model": "../mistral-7b-instruct-v0.1.Q5_0.gguf", "created": 1700659196, "object": "chat.completion.chunk", "choices": [{"index": 0, "delta": {"role": "assistant"}, "finish_reason": null}]}
event: completion
data: {"id": "chatcmpl-6436f7e3-6023-460c-9c3e-c1bfb70efd86", "model": "../mistral-7b-instruct-v0.1.Q5_0.gguf", "created": 1700659196, "object": "chat.completion.chunk", "choices": [{"index": 0, "delta": {"content": "Sure"}, "finish_reason": null}]}
event: completion
data: {"id": "chatcmpl-6436f7e3-6023-460c-9c3e-c1bfb70efd86", "model": "../mistral-7b-instruct-v0.1.Q5_0.gguf", "created": 1700659196, "object": "chat.completion.chunk", "choices": [{"index": 0, "delta": {"content": ","}, "finish_reason": null}]}
event: completion
data: {"id": "chatcmpl-6436f7e3-6023-460c-9c3e-c1bfb70efd86", "model": "../mistral-7b-instruct-v0.1.Q5_0.gguf", "created": 1700659196, "object": "chat.completion.chunk", "choices": [{"index": 0, "delta": {"content": " I"}, "finish_reason": null}]}
event: completion
data: {"id": "chatcmpl-6436f7e3-6023-460c-9c3e-c1bfb70efd86", "model": "../mistral-7b-instruct-v0.1.Q5_0.gguf", "created": 1700659196, "object": "chat.completion.chunk", "choices": [{"index": 0, "delta": {"content": " can"}, "finish_reason": null}]}
event: completion
data: {"id": "chatcmpl-6436f7e3-6023-460c-9c3e-c1bfb70efd86", "model": "../mistral-7b-instruct-v0.1.Q5_0.gguf", "created": 1700659196, "object": "chat.completion.chunk", "choices": [{"index": 0, "delta": {"content": " help"}, "finish_reason": null}]}
event: completion
data: {"id": "chatcmpl-6436f7e3-6023-460c-9c3e-c1bfb70efd86", "model": "../mistral-7b-instruct-v0.1.Q5_0.gguf", "created": 1700659196, "object": "chat.completion.chunk", "choices": [{"index": 0, "delta": {"content": " you"}, "finish_reason": null}]}
event: completion
data: {"id": "chatcmpl-6436f7e3-6023-460c-9c3e-c1bfb70efd86", "model": "../mistral-7b-instruct-v0.1.Q5_0.gguf", "created": 1700659196, "object": "chat.completion.chunk", "choices": [{"index": 0, "delta": {"content": " with"}, "finish_reason": null}]}
event: completion
data: {"id": "chatcmpl-6436f7e3-6023-460c-9c3e-c1bfb70efd86", "model": "../mistral-7b-instruct-v0.1.Q5_0.gguf", "created": 1700659196, "object": "chat.completion.chunk", "choices": [{"index": 0, "delta": {"content": " that"}, "finish_reason": null}]}
event: completion
data: {"id": "chatcmpl-6436f7e3-6023-460c-9c3e-c1bfb70efd86", "model": "../mistral-7b-instruct-v0.1.Q5_0.gguf", "created": 1700659196, "object": "chat.completion.chunk", "choices": [{"index": 0, "delta": {"content": "!"}, "finish_reason": null}]}
event: completion
data: {"id": "chatcmpl-6436f7e3-6023-460c-9c3e-c1bfb70efd86", "model": "../mistral-7b-instruct-v0.1.Q5_0.gguf", "created": 1700659196, "object": "chat.completion.chunk", "choices": [{"index": 0, "delta": {"content": " What"}, "finish_reason": null}]}
event: completion
data: {"id": "chatcmpl-6436f7e3-6023-460c-9c3e-c1bfb70efd86", "model": "../mistral-7b-instruct-v0.1.Q5_0.gguf", "created": 1700659196, "object": "chat.completion.chunk", "choices": [{"index": 0, "delta": {"content": " do"}, "finish_reason": null}]}
event: completion
data: {"id": "chatcmpl-6436f7e3-6023-460c-9c3e-c1bfb70efd86", "model": "../mistral-7b-instruct-v0.1.Q5_0.gguf", "created": 1700659196, "object": "chat.completion.chunk", "choices": [{"index": 0, "delta": {"content": " you"}, "finish_reason": null}]}
event: completion
data: {"id": "chatcmpl-6436f7e3-6023-460c-9c3e-c1bfb70efd86", "model": "../mistral-7b-instruct-v0.1.Q5_0.gguf", "created": 1700659196, "object": "chat.completion.chunk", "choices": [{"index": 0, "delta": {"content": " need"}, "finish_reason": null}]}
event: completion
data: {"id": "chatcmpl-6436f7e3-6023-460c-9c3e-c1bfb70efd86", "model": "../mistral-7b-instruct-v0.1.Q5_0.gguf", "created": 1700659196, "object": "chat.completion.chunk", "choices": [{"index": 0, "delta": {"content": " assistance"}, "finish_reason": null}]}
event: completion
data: {"id": "chatcmpl-6436f7e3-6023-460c-9c3e-c1bfb70efd86", "model": "../mistral-7b-instruct-v0.1.Q5_0.gguf", "created": 1700659196, "object": "chat.completion.chunk", "choices": [{"index": 0, "delta": {"content": " with"}, "finish_reason": null}]}
event: completion
data: {"id": "chatcmpl-6436f7e3-6023-460c-9c3e-c1bfb70efd86", "model": "../mistral-7b-instruct-v0.1.Q5_0.gguf", "created": 1700659196, "object": "chat.completion.chunk", "choices": [{"index": 0, "delta": {"content": "?"}, "finish_reason": null}]}
event: completion
data: {"id": "chatcmpl-6436f7e3-6023-460c-9c3e-c1bfb70efd86", "model": "../mistral-7b-instruct-v0.1.Q5_0.gguf", "created": 1700659196, "object": "chat.completion.chunk", "choices": [{"index": 0, "delta": {"content": " "}, "finish_reason": null}]}
event: completion
data: {"id": "chatcmpl-6436f7e3-6023-460c-9c3e-c1bfb70efd86", "model": "../mistral-7b-instruct-v0.1.Q5_0.gguf", "created": 1700659196, "object": "chat.completion.chunk", "choices": [{"index": 0, "delta": {"content": "\ud83d\ude0a"}, "finish_reason": null}]}
event: completion
data: {"id": "chatcmpl-6436f7e3-6023-460c-9c3e-c1bfb70efd86", "model": "../mistral-7b-instruct-v0.1.Q5_0.gguf", "created": 1700659196, "object": "chat.completion.chunk", "choices": [{"index": 0, "delta": {}, "finish_reason": "stop"}]}
event: done
data: [DONE]
Interesting: I assume you using in-process python to run it right? so it may be the packaging (ie. pyinstaller?) as the reason for the divergence?
tried again with cht-llama-cpp-mistral-1-aarch64-apple-darwin, but got similar response 🤔 can you try on a clean download maybe?
See https://github.com/premAI-io/prem-app/issues/514