instructor-ai / instructor

structured outputs for llms
https://python.useinstructor.com/
MIT License
8.36k stars 662 forks source link

Tool calls are deleted when openai request is created 1.6.4 #1199

Open R-I-R opened 1 week ago

R-I-R commented 1 week ago

In the recent update 1.6.4 the tool_calls of the messages are being deleted

client = instructor.from_openai(
        OpenAI(), mode=instructor.Mode.TOOLS
    )

info = client.chat.completions.create(
    model='gpt-4o-mini',
    response_model=None,
    messages=[
        {'role': 'user', 'content': 'what is the answer 240'},
        {'role': 'assistant',
        'content': '',
        'tool_calls': [{'id': 'call_nJMfBJgc0YTMgMLhSGVOm9Ky',
            'type': 'function',
            'function': {'name': 'answer_tool', 'arguments': '{"id":240}'}}]},
        {'role': 'tool',
        'content': '123',
        'tool_call_id': 'call_nJMfBJgc0YTMgMLhSGVOm9Ky'}
    ],
    temperature=0.0,
)

DEBUG:openai._base_client:Request options: {'method': 'post', 'url': '/chat/completions', 'files': None, 'json_data': {'messages': [{'role': 'user', 'content': 'what is the answer 240'}, {'role': 'assistant', 'content': ''}, {'role': 'tool', 'content': 'the answer is 123'}], 'model': 'gpt-4o-mini', 'temperature': 0.0}}

INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 400 Bad Request"

but in the version 1.6.3 the same code works fine DEBUG:openai._base_client:Request options: {'method': 'post', 'url': '/chat/completions', 'files': None, 'json_data': {'messages': [{'role': 'user', 'content': 'what is the answer 240'}, {'role': 'assistant', 'content': '', 'tool_calls': [{'id': 'call_nJMfBJgc0YTMgMLhSGVOm9Ky', 'type': 'function', 'function': {'name': 'answer_tool', 'arguments': '{"id":240}'}}]}, {'role': 'tool', 'content': '123', 'tool_call_id': 'call_nJMfBJgc0YTMgMLhSGVOm9Ky'}], 'model': 'gpt-4o-mini', 'temperature': 0.0}}

INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"

juanwisz commented 4 days ago

I've taken a look at the code and I think I identified what's happening in 1.6.4. The problem stems from changes in the message handling functions, particularly in how we process messages in instructor/utils.py. The issue occurs because the message processing pipeline is not preserving all fields when handling assistant messages. This causes tool_calls and other fields to be dropped during message reconstruction, as we can see in your debug output where the tool_calls field disappears from the assistant message.

Maybe a quick fix we could implement:

The issue occurs in instructor/process_response.py where messages are being filtered. When we filter out system messages, we're inadvertently dropping additional fields like tool_calls:

# Current code that's causing the issue
new_kwargs["messages"] = [m for m in new_kwargs.get("messages", []) if m["role"] != "system"]

This list comprehension is only preserving the base message structure, dropping fields like tool_calls. This is why you see the difference in the debug output:

1.6.3 (working):

{'role': 'assistant', 'content': '', 'tool_calls': [{'id': 'call_nJMfBJgc0YTMgMLhSGVOm9Ky', 'type': 'function', 'function': {'name': 'answer_tool', 'arguments': '{"id":240}'}}]}

1.6.4 (broken):

{'role': 'assistant', 'content': ''}

The fix is straightforward - we need to preserve all fields when filtering messages:

# Fixed version
new_kwargs["messages"] = [m.copy() for m in new_kwargs.get("messages", []) if m["role"] != "system"]

By using .copy(), we ensure all fields of the message dictionary are preserved, including tool_calls.

Let me know if I got it right or if you'd like me to explain any part in more detail.

Let me know if I understood correctly. And if you think this could be a solution.