Mistral 7B Instruct - "cannot parse response" after one or two response

Janaka-Steph commented 11 months ago

See https://github.com/premAI-io/prem-app/issues/514

tiero commented 11 months ago

To replicate the regression bug (and maybe is time to have an end-to-end test to run automatically)

Run Mistral 7B Instruct locally
Assuming is running on http://localhost:8447

First HTTP request (usually successful)

curl --location 'http://localhost:8447/v1/chat/completions' \
--header 'Content-Type: application/json' \
--data '{
    "model": "mistral-7b-instruct-v0.1.Q5_0.gguf",
    "messages": [
        {
            "role": "user",
            "content": "explain Bitcoin like I am 5"
        }
    ],
    "stream": true,
    "temperature": 0.2,
    "max_tokens": 256,
    "top_p": 0.95,
    "frequency_penalty": 0,
    "n": 1,
    "presence_penalty": 0
}'

Second call with anything, it will return stop early on

curl --location 'http://localhost:8447/v1/chat/completions' \
--header 'Content-Type: application/json' \
--data '{
    "model": "mistral-7b-instruct-v0.1.Q5_0.gguf",
    "messages": [
        {
            "role": "user",
            "content": "do it with emoji"
        }
    ],
    "stream": true,
    "temperature": 0.2,
    "max_tokens": 256,
    "top_p": 0.95,
    "frequency_penalty": 0,
    "n": 1,
    "presence_penalty": 0
}'

Response

event: completion
data: {"id": "chatcmpl-d8676dd6-9320-4eb1-ae97-0ef8ad6f7754", "model": "mistral-7b-instruct-v0.1.Q5_0.gguf", "created": 1700658362, "object": "chat.completion.chunk", "choices": [{"index": 0, "delta": {"role": "assistant"}, "finish_reason": null}]}

event: completion
data: {"id": "chatcmpl-d8676dd6-9320-4eb1-ae97-0ef8ad6f7754", "model": "mistral-7b-instruct-v0.1.Q5_0.gguf", "created": 1700658362, "object": "chat.completion.chunk", "choices": [{"index": 0, "delta": {}, "finish_reason": "stop"}]}

event: done
data: [DONE]

biswaroop1547 commented 11 months ago

on second call I got this response:

event: completion
data: {"id": "chatcmpl-6436f7e3-6023-460c-9c3e-c1bfb70efd86", "model": "../mistral-7b-instruct-v0.1.Q5_0.gguf", "created": 1700659196, "object": "chat.completion.chunk", "choices": [{"index": 0, "delta": {"role": "assistant"}, "finish_reason": null}]}

event: completion
data: {"id": "chatcmpl-6436f7e3-6023-460c-9c3e-c1bfb70efd86", "model": "../mistral-7b-instruct-v0.1.Q5_0.gguf", "created": 1700659196, "object": "chat.completion.chunk", "choices": [{"index": 0, "delta": {"content": "Sure"}, "finish_reason": null}]}

event: completion
data: {"id": "chatcmpl-6436f7e3-6023-460c-9c3e-c1bfb70efd86", "model": "../mistral-7b-instruct-v0.1.Q5_0.gguf", "created": 1700659196, "object": "chat.completion.chunk", "choices": [{"index": 0, "delta": {"content": ","}, "finish_reason": null}]}

event: completion
data: {"id": "chatcmpl-6436f7e3-6023-460c-9c3e-c1bfb70efd86", "model": "../mistral-7b-instruct-v0.1.Q5_0.gguf", "created": 1700659196, "object": "chat.completion.chunk", "choices": [{"index": 0, "delta": {"content": " I"}, "finish_reason": null}]}

event: completion
data: {"id": "chatcmpl-6436f7e3-6023-460c-9c3e-c1bfb70efd86", "model": "../mistral-7b-instruct-v0.1.Q5_0.gguf", "created": 1700659196, "object": "chat.completion.chunk", "choices": [{"index": 0, "delta": {"content": " can"}, "finish_reason": null}]}

event: completion
data: {"id": "chatcmpl-6436f7e3-6023-460c-9c3e-c1bfb70efd86", "model": "../mistral-7b-instruct-v0.1.Q5_0.gguf", "created": 1700659196, "object": "chat.completion.chunk", "choices": [{"index": 0, "delta": {"content": " help"}, "finish_reason": null}]}

event: completion
data: {"id": "chatcmpl-6436f7e3-6023-460c-9c3e-c1bfb70efd86", "model": "../mistral-7b-instruct-v0.1.Q5_0.gguf", "created": 1700659196, "object": "chat.completion.chunk", "choices": [{"index": 0, "delta": {"content": " you"}, "finish_reason": null}]}

event: completion
data: {"id": "chatcmpl-6436f7e3-6023-460c-9c3e-c1bfb70efd86", "model": "../mistral-7b-instruct-v0.1.Q5_0.gguf", "created": 1700659196, "object": "chat.completion.chunk", "choices": [{"index": 0, "delta": {"content": " with"}, "finish_reason": null}]}

event: completion
data: {"id": "chatcmpl-6436f7e3-6023-460c-9c3e-c1bfb70efd86", "model": "../mistral-7b-instruct-v0.1.Q5_0.gguf", "created": 1700659196, "object": "chat.completion.chunk", "choices": [{"index": 0, "delta": {"content": " that"}, "finish_reason": null}]}

event: completion
data: {"id": "chatcmpl-6436f7e3-6023-460c-9c3e-c1bfb70efd86", "model": "../mistral-7b-instruct-v0.1.Q5_0.gguf", "created": 1700659196, "object": "chat.completion.chunk", "choices": [{"index": 0, "delta": {"content": "!"}, "finish_reason": null}]}

event: completion
data: {"id": "chatcmpl-6436f7e3-6023-460c-9c3e-c1bfb70efd86", "model": "../mistral-7b-instruct-v0.1.Q5_0.gguf", "created": 1700659196, "object": "chat.completion.chunk", "choices": [{"index": 0, "delta": {"content": " What"}, "finish_reason": null}]}

event: completion
data: {"id": "chatcmpl-6436f7e3-6023-460c-9c3e-c1bfb70efd86", "model": "../mistral-7b-instruct-v0.1.Q5_0.gguf", "created": 1700659196, "object": "chat.completion.chunk", "choices": [{"index": 0, "delta": {"content": " do"}, "finish_reason": null}]}

event: completion
data: {"id": "chatcmpl-6436f7e3-6023-460c-9c3e-c1bfb70efd86", "model": "../mistral-7b-instruct-v0.1.Q5_0.gguf", "created": 1700659196, "object": "chat.completion.chunk", "choices": [{"index": 0, "delta": {"content": " you"}, "finish_reason": null}]}

event: completion
data: {"id": "chatcmpl-6436f7e3-6023-460c-9c3e-c1bfb70efd86", "model": "../mistral-7b-instruct-v0.1.Q5_0.gguf", "created": 1700659196, "object": "chat.completion.chunk", "choices": [{"index": 0, "delta": {"content": " need"}, "finish_reason": null}]}

event: completion
data: {"id": "chatcmpl-6436f7e3-6023-460c-9c3e-c1bfb70efd86", "model": "../mistral-7b-instruct-v0.1.Q5_0.gguf", "created": 1700659196, "object": "chat.completion.chunk", "choices": [{"index": 0, "delta": {"content": " assistance"}, "finish_reason": null}]}

event: completion
data: {"id": "chatcmpl-6436f7e3-6023-460c-9c3e-c1bfb70efd86", "model": "../mistral-7b-instruct-v0.1.Q5_0.gguf", "created": 1700659196, "object": "chat.completion.chunk", "choices": [{"index": 0, "delta": {"content": " with"}, "finish_reason": null}]}

event: completion
data: {"id": "chatcmpl-6436f7e3-6023-460c-9c3e-c1bfb70efd86", "model": "../mistral-7b-instruct-v0.1.Q5_0.gguf", "created": 1700659196, "object": "chat.completion.chunk", "choices": [{"index": 0, "delta": {"content": "?"}, "finish_reason": null}]}

event: completion
data: {"id": "chatcmpl-6436f7e3-6023-460c-9c3e-c1bfb70efd86", "model": "../mistral-7b-instruct-v0.1.Q5_0.gguf", "created": 1700659196, "object": "chat.completion.chunk", "choices": [{"index": 0, "delta": {"content": " "}, "finish_reason": null}]}

event: completion
data: {"id": "chatcmpl-6436f7e3-6023-460c-9c3e-c1bfb70efd86", "model": "../mistral-7b-instruct-v0.1.Q5_0.gguf", "created": 1700659196, "object": "chat.completion.chunk", "choices": [{"index": 0, "delta": {"content": "\ud83d\ude0a"}, "finish_reason": null}]}

event: completion
data: {"id": "chatcmpl-6436f7e3-6023-460c-9c3e-c1bfb70efd86", "model": "../mistral-7b-instruct-v0.1.Q5_0.gguf", "created": 1700659196, "object": "chat.completion.chunk", "choices": [{"index": 0, "delta": {}, "finish_reason": "stop"}]}

event: done
data: [DONE]

tiero commented 11 months ago

Interesting: I assume you using in-process python to run it right? so it may be the packaging (ie. pyinstaller?) as the reason for the divergence?

biswaroop1547 commented 11 months ago

tried again with cht-llama-cpp-mistral-1-aarch64-apple-darwin, but got similar response 🤔 can you try on a clean download maybe?

tiero commented 11 months ago

https://www.loom.com/share/1833a7900d0440aab858670b5f6a65da?from_recorder=1&focus_title=1

premAI-io / prem-services

Mistral 7B Instruct - "cannot parse response" after one or two response #142