rgbkrk / chatlab

⚡️🧪 Fast LLM Tool Calling Experimentation, big and smol
https://chatlab.dev
Other
133 stars 12 forks source link

Stream = False issue #147

Open rvsh2 opened 4 months ago

rvsh2 commented 4 months ago

Hello, I run this code as an example:

chat.register(get_car_price)  # register this function
chat.register(get_top_stories)  # register this function
chat.register(what_time)
chat.register(get_current_weather,weather_parameters)

async def main():
    await chat.submit("What is the weather in San Francisco?")

# Call the async function
asyncio.run(main())

The result is streamed fine:

display_id='d6d40efa-b175-4b57-a24b-9a5efd736a7b' content='' finished=True has_displayed=False
display_id='16450bdf-0ec4-42c2-b93f-ccf4e930c607' content='' finished=False has_displayed=False
display_id='16450bdf-0ec4-42c2-b93f-ccf4e930c607' content='' finished=False has_displayed=True
display_id='16450bdf-0ec4-42c2-b93f-ccf4e930c607' content='The' finished=False has_displayed=True
display_id='16450bdf-0ec4-42c2-b93f-ccf4e930c607' content='The weather' finished=False has_displayed=True
display_id='16450bdf-0ec4-42c2-b93f-ccf4e930c607' content='The weather in' finished=False has_displayed=True
display_id='16450bdf-0ec4-42c2-b93f-ccf4e930c607' content='The weather in San' finished=False has_displayed=True
display_id='16450bdf-0ec4-42c2-b93f-ccf4e930c607' content='The weather in San Francisco' finished=False has_displayed=True
display_id='16450bdf-0ec4-42c2-b93f-ccf4e930c607' content='The weather in San Francisco,' finished=False has_displayed=True
display_id='16450bdf-0ec4-42c2-b93f-ccf4e930c607' content='The weather in San Francisco, CA' finished=False has_displayed=True
display_id='16450bdf-0ec4-42c2-b93f-ccf4e930c607' content='The weather in San Francisco, CA is' finished=False has_displayed=True
display_id='16450bdf-0ec4-42c2-b93f-ccf4e930c607' content='The weather in San Francisco, CA is currently' finished=False has_displayed=True
display_id='16450bdf-0ec4-42c2-b93f-ccf4e930c607' content='The weather in San Francisco, CA is currently sun' finished=False has_displayed=True
display_id='16450bdf-0ec4-42c2-b93f-ccf4e930c607' content='The weather in San Francisco, CA is currently sunny' finished=False has_displayed=True
display_id='16450bdf-0ec4-42c2-b93f-ccf4e930c607' content='The weather in San Francisco, CA is currently sunny and' finished=False has_displayed=True
display_id='16450bdf-0ec4-42c2-b93f-ccf4e930c607' content='The weather in San Francisco, CA is currently sunny and wind' finished=False has_displayed=True
display_id='16450bdf-0ec4-42c2-b93f-ccf4e930c607' content='The weather in San Francisco, CA is currently sunny and windy' finished=False has_displayed=True
display_id='16450bdf-0ec4-42c2-b93f-ccf4e930c607' content='The weather in San Francisco, CA is currently sunny and windy with' finished=False has_displayed=True
display_id='16450bdf-0ec4-42c2-b93f-ccf4e930c607' content='The weather in San Francisco, CA is currently sunny and windy with a' finished=False has_displayed=True
display_id='16450bdf-0ec4-42c2-b93f-ccf4e930c607' content='The weather in San Francisco, CA is currently sunny and windy with a temperature' finished=False has_displayed=True
display_id='16450bdf-0ec4-42c2-b93f-ccf4e930c607' content='The weather in San Francisco, CA is currently sunny and windy with a temperature of' finished=False has_displayed=True
display_id='16450bdf-0ec4-42c2-b93f-ccf4e930c607' content='The weather in San Francisco, CA is currently sunny and windy with a temperature of ' finished=False has_displayed=True
display_id='16450bdf-0ec4-42c2-b93f-ccf4e930c607' content='The weather in San Francisco, CA is currently sunny and windy with a temperature of 7' finished=False has_displayed=True
display_id='16450bdf-0ec4-42c2-b93f-ccf4e930c607' content='The weather in San Francisco, CA is currently sunny and windy with a temperature of 72' finished=False has_displayed=True
display_id='16450bdf-0ec4-42c2-b93f-ccf4e930c607' content='The weather in San Francisco, CA is currently sunny and windy with a temperature of 72 degrees' finished=False has_displayed=True
display_id='16450bdf-0ec4-42c2-b93f-ccf4e930c607' content='The weather in San Francisco, CA is currently sunny and windy with a temperature of 72 degrees F' finished=False has_displayed=True
display_id='16450bdf-0ec4-42c2-b93f-ccf4e930c607' content='The weather in San Francisco, CA is currently sunny and windy with a temperature of 72 degrees Fahren' finished=False has_displayed=True
display_id='16450bdf-0ec4-42c2-b93f-ccf4e930c607' content='The weather in San Francisco, CA is currently sunny and windy with a temperature of 72 degrees Fahrenheit' finished=False has_displayed=True
display_id='16450bdf-0ec4-42c2-b93f-ccf4e930c607' content='The weather in San Francisco, CA is currently sunny and windy with a temperature of 72 degrees Fahrenheit.' finished=False has_displayed=True

BUT if I run with this change: await chat.submit("What is the weather in San Francisco?",stream=False)

I got errors:

Traceback (most recent call last):
  File "D:\!Programs\llm-with-functionary\main.py", line 102, in <module>
    asyncio.run(main())
  File "C:\Users\krist\AppData\Local\Programs\Python\Python311\Lib\asyncio\runners.py", line 190, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "C:\Users\krist\AppData\Local\Programs\Python\Python311\Lib\asyncio\runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\krist\AppData\Local\Programs\Python\Python311\Lib\asyncio\base_events.py", line 653, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "D:\!Programs\llm-with-functionary\main.py", line 98, in main
    await chat.submit("What is the weather in San Francisco?",stream=False)
  File "D:\!Programs\llm-with-functionary\venv\Lib\site-packages\chatlab\chat.py", line 356, in submit
    await self.submit(stream=stream, **kwargs)
  File "D:\!Programs\llm-with-functionary\venv\Lib\site-packages\chatlab\chat.py", line 313, in submit
    full_response = await client.chat.completions.create(
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\!Programs\llm-with-functionary\venv\Lib\site-packages\openai\resources\chat\completions.py", line 1159, in create
    return await self._post(
           ^^^^^^^^^^^^^^^^^
  File "D:\!Programs\llm-with-functionary\venv\Lib\site-packages\openai\_base_client.py", line 1790, in post
    return await self.request(cast_to, opts, stream=stream, stream_cls=stream_cls)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\!Programs\llm-with-functionary\venv\Lib\site-packages\openai\_base_client.py", line 1493, in request
    return await self._request(
           ^^^^^^^^^^^^^^^^^^^^
  File "D:\!Programs\llm-with-functionary\venv\Lib\site-packages\openai\_base_client.py", line 1569, in _request
    return await self._retry_request(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\!Programs\llm-with-functionary\venv\Lib\site-packages\openai\_base_client.py", line 1615, in _retry_request
    return await self._request(
           ^^^^^^^^^^^^^^^^^^^^
  File "D:\!Programs\llm-with-functionary\venv\Lib\site-packages\openai\_base_client.py", line 1569, in _request
    return await self._retry_request(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\!Programs\llm-with-functionary\venv\Lib\site-packages\openai\_base_client.py", line 1615, in _retry_request
    return await self._request(
           ^^^^^^^^^^^^^^^^^^^^
  File "D:\!Programs\llm-with-functionary\venv\Lib\site-packages\openai\_base_client.py", line 1584, in _request
    raise self._make_status_error_from_response(err.response) from None
openai.InternalServerError: Internal Server Error

Is this an issue or I am doing something wrong?

rvsh2 commented 4 months ago

I've modified the code in chat.py to show the messages generated in those two cases:

            if stream:
                print(chat_create_kwargs["messages"])
                streaming_response = await client.chat.completions.create(
                    **chat_create_kwargs,
                    stream=True,
                )
                self.append(*messages)

                finish_reason, function_call_request, tool_arguments = await self.__process_stream(streaming_response)
            else:
                print(chat_create_kwargs["messages"])
                full_response = await client.chat.completions.create(
                    **chat_create_kwargs,
                    stream=False,
                )

I've got those results: stream = False

[{'role': 'user', 'content': 'What time is it in your timezone?'}]
display_id='b1e4e516-a85f-4093-bd0c-62dbb6aa268c' content='' finished=True has_displayed=False
None
[{'role': 'user', 'content': 'What time is it in your timezone?'}, {'role': 'assistant', 'tool_calls': [{'id': 'call_cAPdStYy6dMXYTkw617eAdCw', 'function': {'name': 'what_time', 'arguments': '{}'}, 'type': 'function'}]}, {'role': 'tool', 'name': 'what_time', 'content': '22:30', 'tool_call_id': 'call_cAPdStYy6dMXYTkw617eAdCw'}]

stream = true

{'role': 'user', 'content': 'What time is it in your timezone?'}]
None
[{'role': 'user', 'content': 'What time is it in your timezone?'}, {'content': None, 'role': 'assistant', 'function_call': None, 'tool_calls': [{'id': 'call_M5NWqRtK2ZDAlbDZqm8yewgh', 'function': {'arguments': '{}', 'name': 'what_time'}, 'type': 'function', 'index': None}], 'tool_call_id': None, 'name': None}, {'role': 'assistant', 'tool_calls': [{'id': 'call_M5NWqRtK2ZDAlbDZqm8yewgh', 'function': {'name': 'what_time', 'arguments': '{}'}, 'type': 'function'}]}, {'role': 'tool', 'name': 'what_time', 'content': '22:33', 'tool_call_id': 'call_M5NWqRtK2ZDAlbDZqm8yewgh'}]

I'm using functionary-small-v2.4 as a model with vllm.

Can anyone help?

rvsh2 commented 4 months ago

vllm gives this output:

functionary                    | Future exception was never retrieved
functionary                    | future: <Future finished exception=TypeError("'NoneType' object is not subscriptable")>
functionary                    | Traceback (most recent call last):
functionary                    |   File "/workspace/functionary/functionary/vllm_monkey_patch/async_llm_engine.py", line 42, in _raise_exception_on_finish
functionary                    |     task.result()
functionary                    |   File "/workspace/functionary/functionary/vllm_monkey_patch/async_llm_engine.py", line 441, in run_engine_loop
functionary                    |     has_requests_in_progress = await self.engine_step()
functionary                    |   File "/workspace/functionary/functionary/vllm_monkey_patch/async_llm_engine.py", line 419, in engine_step
functionary                    |     request_outputs = await self.engine.step_async()
functionary                    |   File "/workspace/functionary/functionary/vllm_monkey_patch/async_llm_engine.py", line 265, in step_async
functionary                    |     ) = prompt_template.grammar_sample(
functionary                    |   File "/workspace/functionary/functionary/prompt_template/base_template.py", line 297, in grammar_sample
functionary                    |     options = [tool_or_func["name"] for tool_or_func in tools_or_functions]
functionary                    |   File "/workspace/functionary/functionary/prompt_template/base_template.py", line 297, in <listcomp>
functionary                    |     options = [tool_or_func["name"] for tool_or_func in tools_or_functions]
functionary                    | TypeError: 'NoneType' object is not subscriptable
rvsh2 commented 4 months ago

If i disable grammar sampling I've got this in vllm:

functionary                    | ERROR:    Exception in ASGI application
functionary                    | Traceback (most recent call last):
functionary                    |   File "/usr/local/lib/python3.10/dist-packages/uvicorn/protocols/http/httptools_impl.py", line 426, in run_asgi
functionary                    |     result = await app(  # type: ignore[func-returns-value]
functionary                    |   File "/usr/local/lib/python3.10/dist-packages/uvicorn/middleware/proxy_headers.py", line 84, in __call__
functionary                    |     return await self.app(scope, receive, send)
functionary                    |   File "/usr/local/lib/python3.10/dist-packages/fastapi/applications.py", line 1106, in __call__
functionary                    |     await super().__call__(scope, receive, send)
functionary                    |   File "/usr/local/lib/python3.10/dist-packages/starlette/applications.py", line 122, in __call__
functionary                    |     await self.middleware_stack(scope, receive, send)
functionary                    |   File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/errors.py", line 184, in __call__
functionary                    |     raise exc
functionary                    |   File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/errors.py", line 162, in __call__
functionary                    |     await self.app(scope, receive, _send)
functionary                    |   File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/cors.py", line 83, in __call__
functionary                    |     await self.app(scope, receive, send)
functionary                    |   File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/exceptions.py", line 79, in __call__
functionary                    |     raise exc
functionary                    |   File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/exceptions.py", line 68, in __call__
functionary                    |     await self.app(scope, receive, sender)
functionary                    |   File "/usr/local/lib/python3.10/dist-packages/fastapi/middleware/asyncexitstack.py", line 20, in __call__
functionary                    |     raise e
functionary                    |   File "/usr/local/lib/python3.10/dist-packages/fastapi/middleware/asyncexitstack.py", line 17, in __call__
functionary                    |     await self.app(scope, receive, send)
functionary                    |   File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 718, in __call__
functionary                    |     await route.handle(scope, receive, send)
functionary                    |   File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 276, in handle
functionary                    |     await self.app(scope, receive, send)
functionary                    |   File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 66, in app
functionary                    |     response = await func(request)
functionary                    |   File "/usr/local/lib/python3.10/dist-packages/fastapi/routing.py", line 274, in app
functionary                    |     raw_response = await run_endpoint_function(
functionary                    |   File "/usr/local/lib/python3.10/dist-packages/fastapi/routing.py", line 191, in run_endpoint_function
functionary                    |     return await dependant.call(**values)
functionary                    |   File "/workspace/functionary/server_vllm.py", line 257, in create_chat_completion
functionary                    |     prompt_token_ids = prepare_messages_for_inference(
functionary                    |   File "/workspace/functionary/functionary/inference.py", line 59, in prepare_messages_for_inference
functionary                    |     dic_messages = prompt_template.pre_process_messages_before_inference(dic_messages)
functionary                    |   File "/workspace/functionary/functionary/prompt_template/prompt_template_v2.py", line 202, in pre_process_messages_before_inference
functionary                    |     new_messages = [id_2_tool_messages[cid] for cid in tool_call_ids]
functionary                    |   File "/workspace/functionary/functionary/prompt_template/prompt_template_v2.py", line 202, in <listcomp>
functionary                    |     new_messages = [id_2_tool_messages[cid] for cid in tool_call_ids]
functionary                    | KeyError: 'call_2yW4Acq9GFz6Y1t9EwL56nGi'
rvsh2 commented 4 months ago

Hi,

I manage to solve the issue but I don't know why it works as it should. In chat.py I replaced lines in submit function to:

        if finish_reason == "tool_calls" and tool_arguments:
            assistant_tool_calls(tool_arguments)

I found that model is always returning finish_reason = "tool_calls" if there was tool calling even in response with content. But the response had always tool_arguments = []. That's why I added and tool_arguements

Without this change the inference never stops.

The second change I remove append because it was appending the same tool data to the message. The message already had the tool data in it. I compared messages with stream = False and True and found that this additional info was a problem with crashing vllm_server.py.

As you can see in the example that the tool call id is present two times when using chat with stream=False option.

I don't know if this behaviour is only connected with functionary-v2.4 model because I didn't test any other model with this.

STREAM = FALSE

messages:  
[
  {'role': 'user', 'content': 'What time is it in your timezone?'},
  {'content': None, 
   'role': 'assistant', 'function_call': None, 
    'tool_calls': 
      [
        {'id': 'call_FqTEnvrccdkwasPaieYBRoMz', 'function': {'arguments': '{}', 'name': 'what_time'}, 'type': 'function', 'index': None}
      ], 'tool_call_id': None, 'name': None
  }, 
  {'role': 'assistant', 
    'tool_calls': 
      [
        {'id': 'call_FqTEnvrccdkwasPaieYBRoMz', 'function': {'name': 'what_time', 'arguments': '{}'}, 'type': 'function'}
      ]
  }
]

STREAM = TRUE

messages:  
[
  {'role': 'user', 'content': 'What time is it in your timezone?'}, 
  {'role': 'assistant', 
    'tool_calls': 
      [
        {'id': 'call_J6oDmvMgM4fYuuID5uqbHcmx', 'function': {'name': 'what_time', 'arguments': '{}'}, 'type': 'function'}
      ]
  }
]

Hopefully this solve the issue. Can you comment?

rgbkrk commented 4 months ago

Interesting, thank you. I'll have to dig in further.