opensearch-project / ml-commons

ml-commons provides a set of common machine learning algorithms, e.g. k-means, or linear regression, to help developers build ML related features within OpenSearch.
Apache License 2.0
99 stars 138 forks source link

[BUG] Invalid payload Exception when using conversational agent with own OpenAi compatible LLM server #2909

Open reuschling opened 2 months ago

reuschling commented 2 months ago

I have a running llm model that connects to a self-hosted, OpenAi compatible server with a Llama3.1 model behind. I can make requests to the model with OpenSearch, further everything works when I use this model in a RAG setting by specifying the RAG workflow with a search pipeline with the help of the 'retrieval_augmented_generation' response processor. Running requests like this works:

{{ _.openSearchUrl }}/_plugins/ml/models/{{ .LlmModelId }}/_predict

{
  "parameters": {
    "messages": [ 
      {
        "role": "user",
        "content": "Context:  I am Clara. I live in Texas. Question: Where do I live."
      }
    ]
  }
}

The connector definition looks like this. Note the model definition '\"model\": \"meta-llama-3.1-70b-instruct-fp8\"' which is mandatory for my server.

{
    "name": "OpenAI Chat Connector",
    "version": "2",
    "description": "Connector for the SDS OpenAi compatible LLM service",
    "protocol": "http",
    "parameters": {
        "endpoint": "serv-3306.kl.dfki.de:8000"
    },
    "actions": [
        {
            "action_type": "PREDICT",
            "method": "POST",
            "url": "http://${parameters.endpoint}/v1/chat/completions",
            "request_body": "{ \"model\": \"meta-llama-3.1-70b-instruct-fp8\", \"messages\": ${parameters.messages} }"
        }
    ]
}

Now I want to define an agent with this model, i.e. a conversational agent that uses this llm model. This is my first test agent definition. It is just a simple boolean query into an index of mine, the same query running independently runs of course: (please ignore the placeholders for insomnia env variables)

{{ _.openSearchUrl }}/_plugins/_ml/agents/_register

{
    "name": "Test Agent",
    "type": "conversational",
    "description": "Simple agent to test the agent framework",
    "llm": {
        "model_id": "{{ _.LlmModelId }}",
        "parameters": {
            "max_iteration": 5,
            "stop_when_no_tool_found": true,
            "response_filter": "$.completion",
            "disable_trace": false
        }
    },
    "memory": {
        "type": "conversation_index"
    },
    "app_type": "chat_with_rag",
    "tools": [
        {
            "type": "SearchIndexTool",
            "description": "A tool to search opensearch index with natural language question. If you don't know answer for some question, you should always try to search data with this tool. Action Input: <natural language question>",
            "parameters": {
                "input": "{\"index\": \"scll_agendaitem\", \"query\": ${parameters.query} }",
                "query": {
                    "query": {
                        "bool": {
                            "should": [
                                {
                                    "match": {
                                        "tns_body_chunked{{ _.docBodyChunkLength }}": {
                                            "query": "{{ _.query }}"
                                        }
                                    }
                                },
                                {
                                    "match": {
                                        "tns_body_chunked{{ _.docBodyChunkLength }}.ngram": {
                                            "query": "{{ _.query }}"
                                        }
                                    }
                                }
                            ]
                        }
                    },
                    "size": 5,
                    "_source": "tns_body_chunked{{ _.docBodyChunkLength }}"
                }
            }
        }
    ]
}

Now I run the agent:

{{ _.openSearchUrl }}/_plugins/ml/agents/{{ .agentId }}/_execute

{
    "parameters": {
        "question": "What is a banana",
         "verbose": true

    }
}

I get this error:

{
    "error": {
        "reason": "Invalid Request",
        "details": "Invalid payload: { \"model\": \"meta-llama-3.1-70b-instruct-fp8\", \"messages\": ${parameters.messages} }",
        "type": "IllegalArgumentException"
    },
    "status": 400
}

This is the according exception from the logs:

 [2024-09-06T16:13:15,604][ERROR][o.o.m.e.a.a.MLChatAgentRunner] [pc-4156] Failed to run chat agent
java.lang.IllegalArgumentException: Invalid payload: { "model": "meta-llama-3.1-70b-instruct-fp8", "messages": ${parameters.messages} }
    at org.opensearch.ml.common.connector.HttpConnector.createPayload(HttpConnector.java:320) ~[opensearch-ml-common-2.15.0.0.jar:?]
    at org.opensearch.ml.engine.algorithms.remote.RemoteConnectorExecutor.preparePayloadAndInvoke(RemoteConnectorExecutor.java:191) ~[opensearch-ml-algorithms-2.15.0.0.jar:?]
    at org.opensearch.ml.engine.algorithms.remote.RemoteConnectorExecutor.executeAction(RemoteConnectorExecutor.java:88) [opensearch-ml-algorithms-2.15.0.0.jar:?]
    at org.opensearch.ml.engine.algorithms.remote.RemoteModel.asyncPredict(RemoteModel.java:73) [opensearch-ml-algorithms-2.15.0.0.jar:?]
    at org.opensearch.ml.task.MLPredictTaskRunner.runPredict(MLPredictTaskRunner.java:344) [opensearch-ml-2.15.0.0.jar:2.15.0.0]
    at org.opensearch.ml.task.MLPredictTaskRunner.predict(MLPredictTaskRunner.java:316) [opensearch-ml-2.15.0.0.jar:2.15.0.0]
    at org.opensearch.ml.task.MLPredictTaskRunner.lambda$executeTask$8(MLPredictTaskRunner.java:260) [opensearch-ml-2.15.0.0.jar:2.15.0.0]
    at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:882) [opensearch-2.15.0.jar:2.15.0]
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
    at java.base/java.lang.Thread.run(Thread.java:833) [?:?]

I have no idea what I can do, I have the impression that there is some check inside the agent framework that breaks the connector. This would be for no reason, the connector definition should be transparent for the agent framework.

reuschling commented 2 months ago

I did it - the connector was not well defined for the OpenAi compatibel endpoint.

The documentation lacks a bit for defining a connector for an agent for an OpenAi endpoint. I found two solutions:

Legacy /v1/completions endpoint with parameters.prompt:

{
    "name": "OpenAI connector for the agent framework",
    "version": 2,
    "description": "Uses parameters.prompt which is filled from the agent framework",
    "protocol": "http",
    "parameters": {
        "endpoint": "xyz:8000",
        "model": "meta-llama-3.1-70b-instruct-fp8",
        "response_filter": "$.choices[0].text"

    },
    "actions": [
        {
            "action_type": "PREDICT",
            "method": "POST",
            "url": "http://${parameters.endpoint}/v1/completions",
            "request_body": "{ \"model\": \"${parameters.model}\", \"prompt\": \"${parameters.prompt}\" }"
        }
    ]
}

/v1/chat/completions by setting the json list with the prompt as first and only entry:

{
    "name": "OpenAI connector for the agent framework",
    "version": 2,
    "description": "Uses parameters.prompt which is filled from the agent framework, give it as single message list entry",
    "protocol": "http",
    "parameters": {
        "endpoint": "serv-3306.kl.dfki.de:8000",
        "model": "meta-llama-3.1-70b-instruct-fp8",
        "response_filter": "$.choices[0].message.content"
    },
    "actions": [
        {
            "action_type": "PREDICT",
            "method": "POST",
            "url": "http://${parameters.endpoint}/v1/chat/completions",
            "request_body": "{ \"model\": \"${parameters.model}\", \"messages\": [{\"role\":\"user\",\"content\":\"${parameters.prompt}\"}] }"
        }
    ]
}

Specifiying RAG with a retrieval_augmented_generation search response processor generates a correct json message list for OpenAi endpoints inside parameters.messages. For the agent framework, I only found parameters.prompt. Is there maybe also another parameter with the messages json list?

What is the correct way to connect to an OpenAi endpoint model with the agent framework?

mingshl commented 2 months ago

we should create tutorials for agents configurations along with details for connector setting. assigning to @jngz-es