'ibm-mistralai/mixtral-8x7b-instruct-v01-q' commonly returns 500 and 503 errors

jwmatthews commented 7 months ago

I tried running with mixtral yesterday and hit the below exceptions. Perhaps this is a temporary condition on the IBM serving side. Logging so we have a record and can begin to track if this becomes a recurring issue.

Full console output from server: https://gist.github.com/jwmatthews/b148eac202e9ab7bc497228bc4779620

<snip>

genai.exceptions.ApiResponseException: Server Error
{
  "error": "Internal Server Error",
  "extensions": {
    "code": "SERVICE_ERROR",
    "state": null
  },
  "message": "1 CANCELLED: Call cancelled",
  "status_code": 500
}

<snip> 

Error handling request
Traceback (most recent call last):
  File "/Users/jmatthews/git/jwmatthews/kai/env/lib/python3.12/site-packages/aiohttp/web_protocol.py", line 452, in _handle_request
    resp = await request_handler(request)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jmatthews/git/jwmatthews/kai/env/lib/python3.12/site-packages/aiohttp/web_app.py", line 543, in _handle
    resp = await handler(request)
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jmatthews/git/jwmatthews/kai/./kai/server.py", line 431, in get_incident_solutions_for_file
    llm_output = get_incident_solution(incident, False)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jmatthews/git/jwmatthews/kai/./kai/server.py", line 289, in get_incident_solution
    capture.llm_result = model_provider.invoke(prompt)
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jmatthews/git/jwmatthews/kai/kai/model_provider.py", line 163, in invoke
    return self.llm.invoke(prompt)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jmatthews/git/jwmatthews/kai/env/lib/python3.12/site-packages/langchain_core/language_models/chat_models.py", line 166, in invoke
    self.generate_prompt(
  File "/Users/jmatthews/git/jwmatthews/kai/env/lib/python3.12/site-packages/langchain_core/language_models/chat_models.py", line 544, in generate_prompt
    return self.generate(prompt_messages, stop=stop, callbacks=callbacks, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jmatthews/git/jwmatthews/kai/env/lib/python3.12/site-packages/langchain_core/language_models/chat_models.py", line 408, in generate
    raise e
  File "/Users/jmatthews/git/jwmatthews/kai/env/lib/python3.12/site-packages/langchain_core/language_models/chat_models.py", line 398, in generate
    self._generate_with_cache(
  File "/Users/jmatthews/git/jwmatthews/kai/env/lib/python3.12/site-packages/langchain_core/language_models/chat_models.py", line 577, in _generate_with_cache
    return self._generate(
           ^^^^^^^^^^^^^^^
  File "/Users/jmatthews/git/jwmatthews/kai/env/lib/python3.12/site-packages/genai/extensions/langchain/chat_llm.py", line 237, in _generate
    result = handle_stream() if self.streaming else handle_non_stream()
             ^^^^^^^^^^^^^^^
  File "/Users/jmatthews/git/jwmatthews/kai/env/lib/python3.12/site-packages/genai/extensions/langchain/chat_llm.py", line 202, in handle_stream
    for result in self._stream(
  File "/Users/jmatthews/git/jwmatthews/kai/env/lib/python3.12/site-packages/genai/extensions/langchain/chat_llm.py", line 168, in _stream
    for response in self.client.text.chat.create_stream(
  File "/Users/jmatthews/git/jwmatthews/kai/env/lib/python3.12/site-packages/genai/text/chat/chat_generation_service.py", line 179, in create_stream
    yield from generation_stream_handler(
  File "/Users/jmatthews/git/jwmatthews/kai/env/lib/python3.12/site-packages/genai/text/generation/_generation_utils.py", line 13, in generation_stream_handler
    for response in generator:
  File "/Users/jmatthews/git/jwmatthews/kai/env/lib/python3.12/site-packages/genai/_utils/http_client/httpx_client.py", line 42, in post_stream
    raise ApiResponseException(response=BaseErrorResponse(**sse.json()))
genai.exceptions.ApiResponseException: Server Error
{
  "error": "Service Unavailable",
  "extensions": {
    "code": "SERVICE_UNAVAILABLE",
    "state": null,
    "reason": "NO_CONNECTION"
  },
  "message": "The model is temporarily unavailable. This is most likely a temporary condition.",
  "status_code": 503
}

jwmatthews commented 7 months ago

As of 3/24 and 3/25 I am unable to run with mixtral.

Unable to run with $ cat kai/config.toml

[postgresql]
host = "127.0.0.1"
database = "kai"
user = "kai"
password = "<snip>"

[models]
provider = "IBMOpenSource"
args = { model_id = "ibm-mistralai/mixtral-8x7b-instruct-v01-q" }

jwmatthews commented 7 months ago

Need to see if we are invoking use of mixtral correctly

jmontleon commented 7 months ago

I have seen errors like this before. In the past the models have become available again.

JonahSussman commented 7 months ago

Writing here so we don't forget:

Most likely not on our end
Might be a load-balancing thing on IBM's side, as lowering the max_new_tokens argument seems to result in less failures

jwmatthews commented 7 months ago

We continue to see several issues working with mixtral:

503's saying model is temporarily unavailable
- sometimes we can do a retry and will get a response
- sometimes we see all requests fail for several hours at a time
Partial responses (Related to #126)
- the response will end before we expect so we cannot parse the updated file
- the response is semi-complete, in that it obeyed our instructions to close the response correctly, but the contents of the updated java file is not complete, the model will partially write a file and then stop, but continue to 'close' the markdown correctly so we can parse it.

konveyor / kai

'ibm-mistralai/mixtral-8x7b-instruct-v01-q' commonly returns 500 and 503 errors #102