run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
36.46k stars 5.21k forks source link

[Bug]: Cannot use vector index together while I use PropertyGraphIndex and pass the `response_synthesizer` or other params #16583

Closed k8scat closed 2 weeks ago

k8scat commented 2 weeks ago

Bug Description

I want to use PropertyGraphIndex, and there is Vector Index inside, but when I pass the response_synthesizer or node_postprocessors, it failed

And I found the bug is from PropertyGraphIndex.as_retriever:

image

while embed is enabled, the VectorContextRetriever will be appended to sub_retrievers, and VectorContextRetriever._retriever_kwargs will be set to kwargs, finally, the VectorStoreQuery will use the VectorContextRetriever._retriever_kwargs to init itself:

image

But there will be a problem, some params are not allowed while construct VectorStoreQuery:

image

Version

0.11.18

Steps to Reproduce

import os
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.core.schema import Document
from llama_index.core.settings import Settings
from llama_index.core import get_response_synthesizer
from llama_index.core import set_global_handler
from llama_index.core import PropertyGraphIndex
from llama_index.core.response_synthesizers.type import ResponseMode
from llama_index.postprocessor.cohere_rerank import CohereRerank

from llm import llm
from embeddings import embed_model

set_global_handler("simple")
Settings.llm = llm
Settings.embed_model = embed_model

# Load documents and build index
documents = SimpleDirectoryReader(
    "../../examples/data/paul_graham"
).load_data()

index = PropertyGraphIndex.from_documents(
    documents,
    # embed_model=embed_model,
    # kg_extractors=[
    #     SchemaLLMPathExtractor(llm=llm)
    # ],
    # property_graph_store=graph_store,
    show_progress=True,
    # embed_kg_nodes=False,
)
# print(f"index.vector_store: {index.vector_store}")
# index.vector_store = None
# print(f"index.property_graph_store.supports_vector_queries: {index.property_graph_store.supports_vector_queries}")

# api_key = os.environ["COHERE_API_KEY"]
# print(f"cohere_api_key: {api_key}")
# cohere_rerank = CohereRerank(api_key=api_key, top_n=3, model="rerank-multilingual-v3.0")

# https://docs.llamaindex.ai/en/stable/module_guides/querying/response_synthesizers/
response_synthesizer = get_response_synthesizer(response_mode=ResponseMode.REFINE)

# index.as_retriever()
query_engine = index.as_query_engine(
    similarity_top_k=5,
    # node_postprocessors=[cohere_rerank],
    response_synthesizer=response_synthesizer,
)

resp = query_engine.query("some question here")

print(resp)

Relevant Logs/Tracbacks

python index_kg_bug_report.py 
None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
Parsing nodes: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 504.97it/s]
Extracting paths from text:   0%|                                                                                                                                                           | 0/1 [00:00<?, ?it/s]** Prompt: **
Some text is provided below. Given the text, extract up to 10 knowledge triplets in the form of (subject, predicate, object). Avoid stopwords.
---------------------
Example:Text: Alice is Bob's mother.Triplets:
(Alice, is mother of, Bob)
Text: Philz is a coffee shop founded in Berkeley in 1982.
Triplets:
(Philz, is, coffee shop)
(Philz, founded in, Berkeley)
(Philz, founded in, 1982)
---------------------
Text: some contents here
Triplets:

**************************************************
** Completion: **
It seems like the text to be analyzed is missing. Please provide the text so
**************************************************

Extracting paths from text: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  1.27it/s]
Extracting implicit paths: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 9686.61it/s]
Generating embeddings: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:01<00:00,  1.34s/it]
Generating embeddings: 0it [00:00, ?it/s]
Retrying llama_index.llms.openai.base.OpenAI._acomplete in 0.021115862678744546 seconds as it raised APIConnectionError: Connection error..
Traceback (most recent call last):
  File "/Users/diling01/go/src/github.com/dl/dl-graphrag/.venv_llamaindex/lib/python3.10/site-packages/llama_index/core/async_utils.py", line 30, in asyncio_run
    loop = asyncio.get_event_loop()
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/asyncio/events.py", line 656, in get_event_loop
    raise RuntimeError('There is no current event loop in thread %r.'
RuntimeError: There is no current event loop in thread 'MainThread'.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/diling01/go/src/github.com/dl/dl-graphrag/llamaindex/index_kg_bug_report.py", line 58, in <module>
    resp = query_engine.query("some question here")
  File "/Users/diling01/go/src/github.com/dl/dl-graphrag/.venv_llamaindex/lib/python3.10/site-packages/llama_index/core/instrumentation/dispatcher.py", line 307, in wrapper
    result = func(*args, **kwargs)
  File "/Users/diling01/go/src/github.com/dl/dl-graphrag/.venv_llamaindex/lib/python3.10/site-packages/llama_index/core/base/base_query_engine.py", line 52, in query
    query_result = self._query(str_or_query_bundle)
  File "/Users/diling01/go/src/github.com/dl/dl-graphrag/.venv_llamaindex/lib/python3.10/site-packages/llama_index/core/instrumentation/dispatcher.py", line 307, in wrapper
    result = func(*args, **kwargs)
  File "/Users/diling01/go/src/github.com/dl/dl-graphrag/.venv_llamaindex/lib/python3.10/site-packages/llama_index/core/query_engine/retriever_query_engine.py", line 177, in _query
    nodes = self.retrieve(query_bundle)
  File "/Users/diling01/go/src/github.com/dl/dl-graphrag/.venv_llamaindex/lib/python3.10/site-packages/llama_index/core/query_engine/retriever_query_engine.py", line 132, in retrieve
    nodes = self._retriever.retrieve(query_bundle)
  File "/Users/diling01/go/src/github.com/dl/dl-graphrag/.venv_llamaindex/lib/python3.10/site-packages/llama_index/core/instrumentation/dispatcher.py", line 307, in wrapper
    result = func(*args, **kwargs)
  File "/Users/diling01/go/src/github.com/dl/dl-graphrag/.venv_llamaindex/lib/python3.10/site-packages/llama_index/core/base/base_retriever.py", line 245, in retrieve
    nodes = self._retrieve(query_bundle)
  File "/Users/diling01/go/src/github.com/dl/dl-graphrag/.venv_llamaindex/lib/python3.10/site-packages/llama_index/core/instrumentation/dispatcher.py", line 307, in wrapper
    result = func(*args, **kwargs)
  File "/Users/diling01/go/src/github.com/dl/dl-graphrag/.venv_llamaindex/lib/python3.10/site-packages/llama_index/core/indices/property_graph/retriever.py", line 52, in _retrieve
    return asyncio_run(self._aretrieve(query_bundle))
  File "/Users/diling01/go/src/github.com/dl/dl-graphrag/.venv_llamaindex/lib/python3.10/site-packages/llama_index/core/async_utils.py", line 38, in asyncio_run
    return asyncio.run(coro)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/asyncio/base_events.py", line 646, in run_until_complete
    return future.result()
  File "/Users/diling01/go/src/github.com/dl/dl-graphrag/.venv_llamaindex/lib/python3.10/site-packages/llama_index/core/indices/property_graph/retriever.py", line 64, in _aretrieve
    async_results = await run_jobs(
  File "/Users/diling01/go/src/github.com/dl/dl-graphrag/.venv_llamaindex/lib/python3.10/site-packages/llama_index/core/instrumentation/dispatcher.py", line 353, in async_wrapper
    result = await func(*args, **kwargs)
  File "/Users/diling01/go/src/github.com/dl/dl-graphrag/.venv_llamaindex/lib/python3.10/site-packages/llama_index/core/async_utils.py", line 148, in run_jobs
    results = await asyncio.gather(*pool_jobs)
  File "/Users/diling01/go/src/github.com/dl/dl-graphrag/.venv_llamaindex/lib/python3.10/site-packages/llama_index/core/instrumentation/dispatcher.py", line 353, in async_wrapper
    result = await func(*args, **kwargs)
  File "/Users/diling01/go/src/github.com/dl/dl-graphrag/.venv_llamaindex/lib/python3.10/site-packages/llama_index/core/async_utils.py", line 139, in worker
    return await job
  File "/Users/diling01/go/src/github.com/dl/dl-graphrag/.venv_llamaindex/lib/python3.10/site-packages/llama_index/core/instrumentation/dispatcher.py", line 353, in async_wrapper
    result = await func(*args, **kwargs)
  File "/Users/diling01/go/src/github.com/dl/dl-graphrag/.venv_llamaindex/lib/python3.10/site-packages/llama_index/core/base/base_retriever.py", line 276, in aretrieve
    nodes = await self._aretrieve(query_bundle=query_bundle)
  File "/Users/diling01/go/src/github.com/dl/dl-graphrag/.venv_llamaindex/lib/python3.10/site-packages/llama_index/core/indices/property_graph/sub_retrievers/base.py", line 148, in _aretrieve
    nodes = await self.aretrieve_from_graph(query_bundle)
  File "/Users/diling01/go/src/github.com/dl/dl-graphrag/.venv_llamaindex/lib/python3.10/site-packages/llama_index/core/instrumentation/dispatcher.py", line 353, in async_wrapper
    result = await func(*args, **kwargs)
  File "/Users/diling01/go/src/github.com/dl/dl-graphrag/.venv_llamaindex/lib/python3.10/site-packages/llama_index/core/indices/property_graph/sub_retrievers/vector.py", line 168, in aretrieve_from_graph
    vector_store_query = await self._aget_vector_store_query(query_bundle)
  File "/Users/diling01/go/src/github.com/dl/dl-graphrag/.venv_llamaindex/lib/python3.10/site-packages/llama_index/core/indices/property_graph/sub_retrievers/vector.py", line 97, in _aget_vector_store_query
    return VectorStoreQuery(
TypeError: VectorStoreQuery.__init__() got an unexpected keyword argument 'response_synthesizer'
dosubot[bot] commented 2 weeks ago

The issue you're encountering is due to the response_synthesizer parameter being passed to the VectorStoreQuery constructor, which does not accept this parameter. The VectorStoreQuery class has a specific set of allowed parameters, and response_synthesizer is not one of them. The allowed parameters for VectorStoreQuery include query_embedding, similarity_top_k, doc_ids, node_ids, query_str, output_fields, embedding_field, mode, alpha, filters, mmr_threshold, sparse_top_k, and hybrid_top_k [1].

To resolve this issue, you need to ensure that only the parameters accepted by VectorStoreQuery are passed to it. You might need to filter out any unsupported parameters like response_synthesizer before they are passed to the VectorContextRetriever or VectorStoreQuery. This can be done by modifying the code where **kwargs are being passed to ensure only valid parameters are included.

To continue talking to Dosu, mention @dosu.