run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
36.47k stars 5.21k forks source link

[Question]: Query can't find specific items #16770

Open martinb-ai opened 1 day ago

martinb-ai commented 1 day ago

Question Validation

Question

Hi there!

Background:

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader

documents = SimpleDirectoryReader("data").load_data()
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()
response = query_engine.query("Some question about the data should go here")
print(response)

and even adding in:

index.as_query_engine(similarity_top_k=5,  response_mode="tree_summarize")

I am still having issues with searching an index and finding items with the same name.

For instance: ai_use_cases.csv contains the following row:

.
.
180,Smart Digital Signage with AI,"""Use AI-powered signage to display personalized advertisements based on real-time customer data and behavior.""",Retail,"""Revenue Enhancement""",Computer Vision,Medium,4-6 months,"""Ad Engagement, Sales Uplift, Customer Satisfaction"""
.
.

If I ask it the following: "How can artificial intelligence enhance the effectiveness and personalization of smart digital signage solutions?"

It will not be able to find it and comes up with a response of No exact match found or provide completely irrelevant results when it should be able to find that specific one.

Question: Is there a different more elaborate way to search the query? something more advanced or something I might have missed? What are other strategies to consider when doing querying to improve results?

Thank you!

dosubot[bot] commented 1 day ago

I found a similar unsolved discussion that might be relevant to your issue: Question: How can I improve responses? [1].

To improve the search query and find specific items with the same name in a VectorStoreIndex, you can use metadata filters to narrow down the search results. Here's an example of how you can apply filters to enhance query results:

from llama_index.legacy.vector_stores.types import ExactMatchFilter, MetadataFilters, VectorStoreQuery

# Example of using a filter to find items with specific metadata
filters = MetadataFilters(
    filters=[ExactMatchFilter(key="rank", value="c")]
)

# Create a query with the filter
query = VectorStoreQuery(
    query_embedding=[1.0, 1.0],  # Example embedding
    filters=filters,
    similarity_top_k=1  # Number of top similar items to return
)

# Execute the query on the vector store
result = vector_store.query(query)
print(result.ids)

Additionally, consider these advanced strategies to enhance query results:

  1. Different Distance Strategies: Experiment with different distance strategies like DOT_PRODUCT, COSINE, and EUCLIDEAN_DISTANCE to see which one provides the best results for your use case.

  2. Indices with Specific Parameters: Create indices with specific parameters tailored to your distance strategy. For example, using HNSW or IVF indices with custom parameters like accuracy and parallelism can improve search performance.

  3. Advanced Searches: Perform advanced searches by applying filters and sorting results based on similarity scores. Different query modes such as SPARSE, HYBRID, and SEMANTIC_HYBRID can be used to refine search results further [2][3].

These strategies can help refine your search queries and improve the accuracy and relevance of the results returned by the VectorStoreIndex.

To continue talking to Dosu, mention @dosu.