AI Agent (Tools Agent) node doesn’t utilise metadata in responses

a-d-r-i-a-n-d commented 4 weeks ago

Bug Description

When using the AI Agent (Tools Agent) node in conjunction with the Vector Store Tool node, Postgres PGVector Store node, and OpenAI embeddings (using text-embedding-3-small model), the AI Agent is unable to reference metadata fields included in the stored documents.

In my workflow, I have embedded a simple text (“The name of the application is example-app and the main functionality is to provide examples of code”) and added metadata with a field named application_url (value: https://example.com). When querying the AI Agent with: “What is the main functionality of example-app? Also, what is the application URL?”, the agent accurately retrieves the main functionality from the embedding but fails to include or reference the application_url from the metadata.

It’s unclear whether this behavior is intentional, but it would significantly enhance the AI Agent’s capabilities if it could leverage both the embedded content and associated metadata in responses.

To Reproduce

Embed text and metadata into Postgres PGVector Store:
- Add a Vector Store Tool node configured with the Postgres PGVector Store node (operation mode set to "Insert documents").
- Add an OpenAI Embeddings node using the text-embedding-3-small model.
- Embed the following text:
  "The name of the application is example-app and the main functionality is to provide examples of code."
- In the metadata section of the Vector Store Tool node, add a field named application_url with the value https://example.com.
- Run the workflow to store the embedded text and metadata into the Postgres PGVector Store.
Create AI Agent workflow with retriever:
- Add an AI Agent (Tools Agent) node.
- Add a Vector Store Tool node, configured with the Postgres PGVector Store node (operation mode set to "Retrieve documents").
- Connect an OpenAI Embeddings node using the text-embedding-3-small model.
- Ask the following query to the AI Agent:
  "What is the main functionality for example-app? Also, what is the application URL?"
- The AI Agent will return the main functionality correctly based on the embedding, but it will fail to provide the application_url from the metadata, even though the metadata was stored and should be available.

Expected behavior

The AI Agent should be able to utilize metadata associated with the documents stored in the Vector Store when formulating its responses.
When querying the AI Agent about information stored in metadata (e.g., application_url), the agent should include the metadata field value in its response.
The agent should combine both embedded content and metadata in a seamless manner, improving the accuracy and completeness of its responses.

Operating System

docker

n8n Version

1.59.4

Node.js Version

v20.17.0

Database

PostgreSQL

Execution mode

main (default)

Joffcom commented 4 weeks ago

Hey @a-d-r-i-a-n-d,

We have created an internal ticket to look into this which we will be tracking as "GHC-276"

a-d-r-i-a-n-d commented 4 weeks ago

As a side note, while using the GitHub Document Loader node to embed different content from GitHub repositories, if I ask the agent to provide the repository URL, it fails to answer. An example of the metadata looks like this:

{
    "loc": {
        "lines": {
            "to": 122,
            "from": 1
        }
    },
    "branch": "public",
    "source": "defender-endpoint/api/create-alert-by-reference.md",
    "repository": "https://github.com/MicrosoftDocs/defender-docs"
}

Although the metadata contains the repository URL, it seems that this information is not accessible to the Tools Agent.

OlegIvaniv commented 4 weeks ago

Hey @a-d-r-i-a-n-d,

This isn't a bug. Vector Store Tool does not pass the retrieved document chunks to the agent directly but uses the connected LLM to answer the agent's prompt. This enables you to use different models for these jobs to optimize token usage without cluttering the agent's chat history. We could add a new mode to the Vector Store Tool that would return the retrieved chunks directly, but that would be more of an enhancement. Until we have this, the solution would be to use a sub-workflow tool to retrieve and pass the chunks + metadata to the agent. Something like this: CleanShot 2024-10-03 at 09 21 22@2x Here's a JSON for this workflow: Vector_Store_Sub_Workflow_Tool.json

I'll close this issue for now and let you know once we implement the new mode to return the chunks directly to the agent.

n8n-io / n8n