Async completion doesn't work for non-OpenAI LiteLLM model when using function calling

mnicstruwig commented 6 months ago

I can confirm that async completion works via LiteLLM:

from litellm import acompletion
import asyncio

async def test_get_response():
    user_message = "Hello, who are you?"
    messages = [{"content": user_message, "role": "user"}]
    response = await acompletion(model="anthropic/claude-3-haiku-20240307", messages=messages)
    return response

tasks = []
tasks.append(asyncio.create_task(test_get_response()))
tasks.append(asyncio.create_task(test_get_response()))

results = await asyncio.gather(*tasks)
results

Output:

[ModelResponse(id='chatcmpl-8a69b528-bced-4bb0-a3c7-c01a4bf47caf', choices=[Choices(finish_reason='stop', index=0, message=Message(content="Hello! I am an AI assistant created by Anthropic. My name is Claude and I'm here to help you with a variety of tasks. How can I assist you today?", role='assistant'))], created=1711114675, model='claude-3-haiku-20240307', object='chat.completion', system_fingerprint=None, usage=Usage(prompt_tokens=13, completion_tokens=40, total_tokens=53)),
 ModelResponse(id='chatcmpl-d6afe553-5990-4e3e-a053-d6bc599ce800', choices=[Choices(finish_reason='stop', index=0, message=Message(content="Hello! I am Claude, an AI assistant created by Anthropic. I'm here to help with a wide variety of tasks, from research and analysis to creative projects and casual conversation. Feel free to ask me anything and I'll do my best to assist!", role='assistant'))], created=1711114675, model='claude-3-haiku-20240307', object='chat.completion', system_fingerprint=None, usage=Usage(prompt_tokens=13, completion_tokens=56, total_tokens=69))]

However, when I try using magentic, things fail:

@prompt(
    "Tell me three facts about {location}",
    model=LitellmChatModel(model="openai/gpt-3.5-turbo")
)
async def tell_me_more(location: str) -> list[str]: ...

await tell_me_more(location="London")

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/var/folders/gq/jl16rc4171g3102f40n98w0m0000gn/T/ipykernel_52507/228821438.py in ?()
      3     model=LitellmChatModel(model="anthropic/claude-3-haiku-20240307")
      4 )
      5 async def tell_me_more(location: str) -> list[str]: ...
      6 
----> 7 await tell_me_more(location="London")

~/miniforge3/envs/obb-ai/lib/python3.11/site-packages/magentic/prompt_function.py in ?(self, *args, **kwargs)
     93     async def __call__(self, *args: P.args, **kwargs: P.kwargs) -> R:
     94         """Asynchronously query the LLM with the formatted prompt template."""
---> 95         message = await self.model.acomplete(
     96             messages=[UserMessage(content=self.format(*args, **kwargs))],
     97             functions=self._functions,
     98             output_types=self._return_types,

~/miniforge3/envs/obb-ai/lib/python3.11/site-packages/magentic/chat_model/litellm_chat_model.py in ?(self, messages, functions, output_types, stop)
    343                 else None
    344             ),
    345         )
    346 
--> 347         first_chunk = await anext(response)
    348         # Azure OpenAI sends a chunk with empty choices first
    349         if len(first_chunk.choices) == 0:
    350             first_chunk = await anext(response)

~/miniforge3/envs/obb-ai/lib/python3.11/site-packages/litellm/utils.py in ?(self)
   9807             # Handle any exceptions that might occur during streaming
   9808             asyncio.create_task(
   9809                 self.logging_obj.async_failure_handler(e, traceback_exception)
   9810             )
-> 9811             raise e

~/miniforge3/envs/obb-ai/lib/python3.11/site-packages/litellm/utils.py in ?(self)
   9807             # Handle any exceptions that might occur during streaming
   9808             asyncio.create_task(
   9809                 self.logging_obj.async_failure_handler(e, traceback_exception)
   9810             )
-> 9811             raise e

TypeError: 'async for' requires an object with __aiter__ method, got generator

But it works when I use an OpenAI model via litellm though:

@prompt(
    "Tell me three facts about {location}",
    model=LitellmChatModel(model="openai/gpt-3.5-turbo")
)
async def tell_me_more(location: str) -> list[str]: ...

await tell_me_more(location="London")

Output:

['London is the capital city of England.',
 'The River Thames flows through London.',
 'London is one of the most multicultural cities in the world.']

mnicstruwig commented 6 months ago

This only occurs if we're using function calling (or using annotated outputs, which uses function calling under the hood).

jackmpcollins commented 6 months ago

This is a litellm bug. I've opened https://github.com/BerriAI/litellm/issues/2644

Magentic should have more tests for litellm w/ tools/functions now that they support it for these models. I've made an issue for that too https://github.com/jackmpcollins/magentic/issues/154

mnicstruwig commented 6 months ago

Thanks! I figured as much after trying to replicate -- your issue nails the root cause perfectly.

Re: more tests, it makes sense. In < 1 month we've gone from zero to four GPT-4 level models (Gemini, Claude 3, Mistral Large, Pi 2.5 (rip)). Each has different pros / cons, so I'm anticipating people will start mixing and matching in their workflows (like I'm doing at the moment).

jackmpcollins commented 5 months ago

This was fixed in litellm v1.33.7 and corresponding fixes to magentic released in https://github.com/jackmpcollins/magentic/releases/tag/v0.18.1

There is still the separate issue of litellm parsing Claude tool_call responses which affects using tool calling with Claude https://github.com/BerriAI/litellm/pull/2640 / https://github.com/jackmpcollins/magentic/issues/151

jackmpcollins / magentic

Async completion doesn't work for non-OpenAI LiteLLM model when using function calling #153