[Bug]: AutoGen can't work with vLLM v0.5.1

tonyaw commented 4 months ago

Describe the bug

From vllm v0.5.0, it starts to support a new feature "OpenAI tools support named functions": https://github.com/vllm-project/vllm/releases/tag/v0.5.0

After that, every message returned by vllm includes an empty "tools_call" list if user prompt doesn't intend to call a tool:

#####oepnai client.completions.create RESPONSE START#####
ChatCompletion(id='cmpl-e858096512a1428890c6fb28f20386e9', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='e2e4', role='assistant', function_call=None, **tool_calls=[]**), stop_reason=128009)], created=1720773990, model='XXX', object='chat.completion', system_fingerprint=None, usage=CompletionUsage(completion_tokens=5, prompt_tokens=186, total_tokens=191))
#####RESPONSE END#####

After autogen agent receives this message, it always adds a empty tool_calls list in its next message:

#####oepnai client.chat.completions.create PROMPT START#####
[
{
    "content": "You are an AI-powered chess board agent.\nYou translate the user's natural language input into legal UCI moves.\nThe regex format of UCI is \"[a-h][1-8][a-h][1-8][qrnb]?\".\nONLY UCI move is allowed to use!\nFollowing are some examples:\n1. \"Ng8e7\" shall be translated to \"g8e7\".\n2. \"Ng8-f6\" shall be translated to \"g8f6\".\nYou should only reply with a UCI move string extracted from the user's input.",
    "role": "system"
},
{
    "content": "e2e4",
    "tool_calls": [],
    "role": "user"
}
]
#####PROMPT END#####

This causes vllm to return 400, and breaks the conversation:

BadRequestError: Error code: 400 - {'object': 'error', 'message': '[{\'type\': \'literal_error\', \'loc\': (\'body\', \'messages\', 1, \'typed-dict\', \'role\'), \'msg\': "Input should be \'system\'", \'input\': \'user\', \'ctx\': {\'expected\': "\'system\'"}}, {\'type\': \'extra_forbidden\', \'loc\': (\'body\', \'messages\', 1, \'typed-dict\', \'tool_calls\'), \'msg\': \'Extra inputs are not permitted\', \'input\': []}, {\'type\': \'extra_forbidden\', \'loc\': (\'body\', \'messages\', 1, \'typed-dict\', \'tool_calls\'), \'msg\': \'Extra inputs are not permitted\', \'input\': []}, {\'type\': \'literal_error\', \'loc\': (\'body\', \'messages\', 1, \'typed-dict\', \'role\'), \'msg\': "Input should be \'assistant\'", \'input\': \'user\', \'ctx\': {\'expected\': "\'assistant\'"}}, {\'type\': \'literal_error\', \'loc\': (\'body\', \'messages\', 1, \'typed-dict\', \'role\'), \'msg\': "Input should be \'tool\'", \'input\': \'user\', \'ctx\': {\'expected\': "\'tool\'"}}, {\'type\': \'missing\', \'loc\': (\'body\', \'messages\', 1, \'typed-dict\', \'tool_call_id\'), \'msg\': \'Field required\', \'input\': {\'content\': \'e2e4\', \'tool_calls\': [], \'role\': \'user\'}}, {\'type\': \'extra_forbidden\', \'loc\': (\'body\', \'messages\', 1, \'typed-dict\', \'tool_calls\'), \'msg\': \'Extra inputs are not permitted\', \'input\': []}, {\'type\': \'missing\', \'loc\': (\'body\', \'messages\', 1, \'typed-dict\', \'name\'), \'msg\': \'Field required\', \'input\': {\'content\': \'e2e4\', \'tool_calls\': [], \'role\': \'user\'}}, {\'type\': \'literal_error\', \'loc\': (\'body\', \'messages\', 1, \'typed-dict\', \'role\'), \'msg\': "Input should be \'function\'", \'input\': \'user\', \'ctx\': {\'expected\': "\'function\'"}}, {\'type\': \'extra_forbidden\', \'loc\': (\'body\', \'messages\', 1, \'typed-dict\', \'tool_calls\'), \'msg\': \'Extra inputs are not permitted\', \'input\': []}, {\'type\': \'extra_forbidden\', \'loc\': (\'body\', \'messages\', 1, \'typed-dict\', \'tool_calls\'), \'msg\': \'Extra inputs are not permitted\', \'input\': []}]', 'type': 'BadRequestError', 'param': None, 'code': 400}

Steps to reproduce

See description.

Model Used

Llama3 70B. It shall be a communication issue between vllm and autogen, and not related to LLM.

Expected Behavior

autogen can work with vllm v0.5.0 and later version with no problem.

Screenshots and logs

No response

Additional Information

No response

marklysze commented 4 months ago

Hey @tonyaw, this is a bit tricky in my opinion. I feel that it should return None if there are no tool calls to be made rather than an empty list, []. The finish_reason being stop indicates that it is not suggesting tool calls in this response.

What do you think?

tonyaw commented 4 months ago

@marklysze , I'm OK with both None and "[]" as long as it is aligned between agent framework(autogen) and LLM inference framework(vllm). :-) As it follows OpenAI API schema, may I ask if there is some detail requirement from API schema perspective?

tonyaw commented 4 months ago

I also opened a same ticket to vllm. Let's align with vllm team for an agreement. :-)

sanjay920 commented 4 months ago

we integrated tools into vllm with function calling models. might be relevant: https://docs.rubra.ai/inference/vllm

tonyaw commented 4 months ago

@sanjay920, Thanks for your info!

Is it possible to contribute your code change to vllm git repo? :-)
Have you tried your vllm with autogen?

hopefulPanda88 commented 4 months ago

Is there any possible way to bypass this?... This really gives me headache...

marklysze commented 4 months ago

I can suggest a couple of approaches:

A vLLM client class that can handle this empty tool_calls
A change to the existing AutoGen codebase that ignores empty tool_calls

If someone wants to work on a PR that would help.

microsoft / autogen