run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
36.46k stars 5.21k forks source link

[Bug]: doc_ids filter on Weaviate Vector Store throws exception #8537

Closed KristianMischke closed 1 year ago

KristianMischke commented 1 year ago

Bug Description

Specifying doc_ids in a VectorStoreQuery against a weaviate database (v1.18.3) throws an error. Looks like valueString was deprecated since v1.12.0. See: https://weaviate.io/developers/weaviate/more-resources/faq#q-what-is-the-difference-between-text-and-string-and-valuetext-and-valuestring

Version

0.8.50

Steps to Reproduce

Minimal example:

Ensure weaviate is running at localhost:8080 (or adjust for your system to test it out)

import weaviate
from llama_index.vector_stores.weaviate import WeaviateVectorStore
from llama_index.vector_stores import VectorStoreQuery
from llama_index.schema import TextNode
import pandas as pd

client = weaviate.Client(url="http://localhost:8080")
client.schema.delete_all()
vector_store = WeaviateVectorStore(weaviate_client=client, index_name="TestClass", text_key="sentence")

df = pd.DataFrame.from_records([
    {
        "sentence": "this is a test sentence",
        "doc_id": "123",
        "node_id": "9123",
        "embedding": [1, 0, 0],
    },
    {
        "sentence": "Technology helps solve your problems",
        "doc_id": "123",
        "node_id": "9124",
        "embedding": [0, 20, 0],
    },
    {
        "sentence": "Education Technology (Ed-Tech)",
        "doc_id": "124",
        "node_id": "8123",
        "embedding": [0, 20, 5],
    }
])

vector_store.add([
        TextNode(
            text=row["sentence"],
            metadata=row[["doc_id", "node_id"]].to_dict(),
            embedding=row["embedding"]
        ) for _, row in df.iterrows()
    ])

query_vector = [0.1, 18, 2]

query = VectorStoreQuery(
    query_embedding=query_vector,
    similarity_top_k=100,
    doc_ids=['123']  # <-- this is the culprit of the error
)
result = vector_store.query(query=query)

Relevant Logs/Tracbacks

Traceback (most recent call last):
  File "/<redacted_path>/llama_index_bug.py", line 48, in <module>
    result = vector_store.query(query=query)
  File "/<redacted_path>/.venv/lib/python3.10/site-packages/llama_index/vector_stores/weaviate.py", line 275, in query
    parsed_result = parse_get_response(query_result)
  File "/<redacted_path>/.venv/lib/python3.10/site-packages/llama_index/vector_stores/weaviate_utils.py", line 65, in parse_get_response
    raise ValueError("Invalid query, got errors: {}".format(response["errors"]))
ValueError: Invalid query, got errors: [{'locations': [{'column': 6, 'line': 1}], 'message': 'invalid \'where\' filter: child operand at position 0: data type filter cannot use "valueString" on type "text", use "valueText" instead', 'path': ['Get', 'TestClass']}]
dosubot[bot] commented 1 year ago

🤖

Hello @KristianMischke,

I'm here to assist you with your issue while you're waiting for a human maintainer. I'm a bot designed to help solve bugs, answer questions, and guide you in contributing to the project.

From the error message you provided, it seems that the issue is related to the use of valueString in the doc_ids filter of the VectorStoreQuery against a Weaviate database. As you correctly pointed out, valueString was deprecated in Weaviate v1.12.0 and replaced with valueText.

This issue seems to be similar to a previously solved issue in the LlamaIndex repository: [Bug]: doc_ids parameter isn't used in the weaviate query function. This issue was resolved by a pull request: fix: weaviate doc_ids filtering.

The error occurs when the query method of the WeaviateVectorStore class constructs a GraphQL query with a filter using valueString to specify the document IDs. This is no longer supported by the Weaviate database.

Here is the relevant code snippet that needs to be updated:

if query.doc_ids:
    filter_with_doc_ids = {
        "operator": "Or",
        "operands": [
            {"path": ["doc_id"], "operator": "Equal", "valueString": doc_id}
            for doc_id in query.doc_ids
        ],
    }
    query_builder = query_builder.with_where(filter_with_doc_ids)

To fix this issue, you should replace valueString with valueText in the filter construction:

if query.doc_ids:
    filter_with_doc_ids = {
        "operator": "Or",
        "operands": [
            {"path": ["doc_id"], "operator": "Equal", "valueText": doc_id}
            for doc_id in query.doc_ids
        ],
    }
    query_builder = query_builder.with_where(filter_with_doc_ids)

This change should resolve the error related to the deprecation of valueString.

I hope this helps! If you have any other questions or need further clarification, feel free to ask.

Best, Dosu

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

Disiok commented 1 year ago

Thanks for raising this! Will look into a fix.

erika-cardenas commented 1 year ago

Hi @KristianMischke, thanks for raising this issue!

Updating your Weaviate version should fix this issue. You can do this with:

pip install --upgrade weaviate-client

I also made a PR to change the string value to text dataType instead of having Weaviate handle the translation (linked above).