AzureMLOnlineEndpoint (Serverless deployment) request body format is totally wrong.

Ko-Ko-Kirk commented 1 week ago

Checked other resources

[X] I added a very descriptive title to this issue.
[X] I searched the LangChain documentation with the integrated search.
[X] I used the GitHub search to find a similar question and didn't find it.
[X] I am sure that this is a bug in LangChain rather than my code.
[X] The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).

Example Code

from langchain_community.llms.azureml_endpoint import (
    AzureMLOnlineEndpoint,
    CustomOpenAIContentFormatter,
)

llm = AzureMLOnlineEndpoint(
    endpoint_url="https://Meta-Llama-3-1-8B-Instruct-xx.westus3.models.ai.azure.com/v1/chat/completions/",
    endpoint_api_type='serverless',
    endpoint_api_key="xx",
    content_formatter=CustomOpenAIContentFormatter()
)

response = llm.invoke("Hello")

Error Message and Stack Trace (if applicable)

Traceback (most recent call last): File "/Users/koko/Desktop/programming/xx/aml/aml/day10.py", line 33, in response = llm.invoke("Hello") ^^^^^^^^^^^^^^^^^^^ File "/Users/koko/Desktop/programming/xx/aml/.venv/lib/python3.11/site-packages/langchain_core/language_models/llms.py", line 391, in invoke self.generate_prompt( File "/Users/koko/Desktop/programming/xx/aml/.venv/lib/python3.11/site-packages/langchain_core/language_models/llms.py", line 756, in generate_prompt return self.generate(prompt_strings, stop=stop, callbacks=callbacks, *kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/koko/Desktop/programming/xx/aml/.venv/lib/python3.11/site-packages/langchain_core/language_models/llms.py", line 950, in generate output = self._generate_helper( ^^^^^^^^^^^^^^^^^^^^^^ File "/Users/koko/Desktop/programming/xx/aml/.venv/lib/python3.11/site-packages/langchain_core/language_models/llms.py", line 793, in _generate_helper raise e File "/Users/koko/Desktop/programming/xx/aml/.venv/lib/python3.11/site-packages/langchain_core/language_models/llms.py", line 780, in _generate_helper self._generate( File "/Users/koko/Desktop/programming/xx/aml/.venv/lib/python3.11/site-packages/langchain_community/llms/azureml_endpoint.py", line 544, in _generate response_payload = self.http_client.call( ^^^^^^^^^^^^^^^^^^^^^^ File "/Users/koko/Desktop/programming/xx/aml/.venv/lib/python3.11/site-packages/langchain_community/llms/azureml_endpoint.py", line 57, in call response = urllib.request.urlopen( ^^^^^^^^^^^^^^^^^^^^^^^ File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/urllib/request.py", line 216, in urlopen return opener.open(url, data, timeout) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/urllib/request.py", line 525, in open response = meth(req, response) ^^^^^^^^^^^^^^^^^^^ File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/urllib/request.py", line 634, in http_response response = self.parent.error( ^^^^^^^^^^^^^^^^^^ File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/urllib/request.py", line 563, in error return self._call_chain(args) ^^^^^^^^^^^^^^^^^^^^^^^ File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/urllib/request.py", line 496, in _call_chain result = func(*args) ^^^^^^^^^^^ File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/urllib/request.py", line 643, in http_error_default raise HTTPError(req.full_url, code, msg, hdrs, fp) urllib.error.HTTPError: HTTP Error 400: Bad Request

Description

I am trying to use AzureMLOnlineEndpoint with serverless deployment(llama 3.1 8B instruct), I expect it run successfully, but I got bad request 400. I use cURL to request, and I got the answer successfully. Here is my cURL:

curl -X POST https://meta-llama-3-1-8b-instruct-xx.westus3.models.ai.azure.com/v1/chat/completions \
-H "Authorization: Bearer xx" \
-H "Content-Type: application/json" \
-d '{
    "messages": [
        {
            "role": "user",
            "content": "Hello"
        }
    ]
}'

I traced the code in azureml_endpoint.py, and I found the request body is totally wrong. In serverless api version, it will send request like {"prompt": "Hello"}, which is not match the format above. You can check the code in azureml_endpoint.py around line 294-295.

By the way the format_response_payload is wrong as well. The response via cURL is:

{
    "choices":[
        {
            "finish_reason":"stop",
            "index":0,
            "message":{
                "content":"Hello! How can I assist you today?",
                "role":"assistant",
                "tool_calls":[

                ]
            }
        }
    ],
    "created":1726774446,
    "id":"cmpl-e3c1ed3f284f4e988ee024b9ab73bf5d",
    "model":"Meta-Llama-3.1-8B-Instruct",
    "object":"chat.completion",
    "usage":{
        "completion_tokens":10,
        "prompt_tokens":11,
        "total_tokens":21
    }
}

There is no "text" column in the response json, but in azureml_endpoint.py around line 326-327, it parses "text" column.

System Info

System Information

OS: Darwin OS Version: Darwin Kernel Version 23.6.0: Mon Jul 29 21:13:00 PDT 2024; root:xnu-10063.141.2~1/RELEASE_X86_64 Python Version: 3.11.5 (v3.11.5:cce6ba91b3, Aug 24 2023, 10:50:31) [Clang 13.0.0 (clang-1300.0.29.30)]

Package Information

langchain_core: 0.3.1 langchain: 0.3.0 langchain_community: 0.3.0 langsmith: 0.1.123 langchain_text_splitters: 0.3.0

Optional packages not installed

langgraph langserve

Other Dependencies

aiohttp: 3.10.5 async-timeout: Installed. No version info available. dataclasses-json: 0.6.7 httpx: 0.27.2 jsonpatch: 1.33 numpy: 1.26.4 orjson: 3.10.7 packaging: 24.1 pydantic: 2.9.2 pydantic-settings: 2.5.2 PyYAML: 6.0.2 requests: 2.32.3 SQLAlchemy: 2.0.35 tenacity: 8.5.0 typing-extensions: 4.12.2

Ko-Ko-Kirk commented 1 week ago

I fixed it and create a PR here: https://github.com/langchain-ai/langchain/pull/26683

kentmor commented 3 days ago

Any updates on this? I'm having the same issue.

Ko-Ko-Kirk commented 3 days ago

Any updates on this? I'm having the same issue.

You can use my PR. e.g. in pyproject.toml, modify it to langchain-community = { git = "https://github.com/Ko-Ko-Kirk/langchain.git", subdirectory = "libs/community", branch = "fix/serverless-api-400-bug" }

langchain-ai / langchain