langchain-ai / langchain

🦜🔗 Build context-aware reasoning applications
https://python.langchain.com
MIT License
95.74k stars 15.55k forks source link

BadRequestError with vllm locally hosted Llama3 70B Model #23814

Open Haxeebraja opened 5 months ago

Haxeebraja commented 5 months ago

Checked other resources

Example Code

Following below example with locally hosted llama3 70B instruct model with ChatOpenAI
https://langchain-ai.github.io/langgraph/how-tos/create-react-agent/

Simlar issue with the following example: 
from langchain_core.tools import tool
from langgraph.graph import MessageGraph from langgraph.prebuilt import ToolNode, tools_condition

@tool def divide(a: float, b: float) -> int: """Return a / b.""" return a / b

llm = ChatOpenAI( model_name = 'Meta-Llama-3-70B-Instruct', base_url = "http://172.17.0.8:xxxx/v1/", api_key = "EMPTY", temperature=0).bind( response_format={"type": "json_object"} )

tools = [divide]

graph_builder = MessageGraph() 
graph_builder.add_node("tools", ToolNode(tools)) 
graph_builder.add_node("chatbot", llm.bind_tools(tools)) 
graph_builder.add_edge("tools", "chatbot") 
graph_builder.add_conditional_edges( ... "chatbot", tools_condition ... ) graph_builder.set_entry_point("chatbot") 
graph = graph_builder.compile() 

graph.invoke([("user", "What's 329993 divided by 13662?")])

Error Message and Stack Trace (if applicable)

BadRequestError: Error code: 400 - {'object': 'error', 'message': "[{'type': 'extra_forbidden', 'loc': ('body', 'tools'), 'msg': 'Extra inputs are not permitted', 'input': [{'type': 'function', 'function': {'name': 'get_weather', 'description': 'Use this to get weather information.', 'parameters': {'type': 'object', 'properties': {'city': {'enum': ['nyc', 'sf'], 'type': 'string'}}, 'required': ['city']}}}], 'url': 'https://errors.pydantic.dev/2.7/v/extra_forbidden'}]", 'type': 'BadRequestError', 'param': None, 'code': 400}

Description

I have tried instantiating ChatOpenAI as follows: llm = ChatOpenAI( model_name = 'Meta-Llama-3-70B-Instruct', base_url = "http://172.17.0.8:xxxx/v1/", api_key = "EMPTY", temperature=0)

llm = ChatOpenAI( model_name = 'Meta-Llama-3-70B-Instruct', base_url = "http://172.17.0.8:xxxx/v1/", api_key = "EMPTY", temperature=0).bind( response_format={"type": "json_object"} )

System Info

Meta's llama 3 70B Instruct locally hosted on vllm. ChatOpenAI works fine for other application for example RAG and LCEL

vbarda commented 5 months ago

i think the issue here is with llama tool calling and not with LangGraph

Haxeebraja commented 5 months ago

@vbarda Following example demonstrates support for llama 3 however it is using Ollama. https://github.com/langchain-ai/langgraph/blob/main/examples/rag/langgraph_rag_agent_llama3_local.ipynb

Is the issue then in ChatOpenAI/vllm. Do we have an example of vllm/ChatOpenAI?

KevinZeng08 commented 3 months ago

I use Llama3.1-8B-Instruct and get 400 error: BadRequestError: Error code: 400 - {'object': 'error', 'message': "[{'type': 'extra_forbidden', 'loc': ('body', 'parallel_tool_calls'), 'msg': 'Extra inputs are not permitted', 'input': False}]", 'type': 'BadRequestError', 'param': None, 'code': 400} I removed the parameter parallel_tool_calls and succeeds. The conflict arises from different support of function calling by Llama3.1 and OpenAI, Llama3.1 does not support this parameter while OpenAI does. https://platform.openai.com/docs/guides/function-calling Can someone handle this issue?

vizsatiz commented 1 month ago

Facing the same issue, looks like vllm doesn't accept function_call and function as a parameter, but it uses tools

ArekAff commented 1 month ago

you should modify you llm definiosion to llm = ChatOpenAI( model_name = 'Meta-Llama-3-70B-Instruct', base_url = "http://172.17.0.8:xxxx/v1/", api_key = "EMPTY", temperature=0).bind( response_format={"type": "json_object"},disabled_params={"parallel_tool_calls": None} )