run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
37.09k stars 5.32k forks source link

[Bug]: Cannot use TextToCypherRetriever in Property Graph #14536

Open hoangpnhat opened 5 months ago

hoangpnhat commented 5 months ago

Bug Description

I'm using an existing neo4j graph and trying to run TextToCypherRetriever When I run a query, the retriever returns error File "llama_index/core/indices/property_graph/sub_retrievers/text_to_cypher.py", line 131, in aretrieve_from_graph response = await self.llm.apredict( ^^^^^^^^^^^^^^^^^^^^^^^^ File "llama_index/core/instrumentation/dispatcher.py", line 255, in async_wrapper result = await func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "llama_index/core/llms/llm.py", line 517, in apredict LLMPredictStartEvent(template=prompt, template_args=prompt_args) File "pydantic/v1/main.py", line 341, in init raise validation_error pydantic.v1.error_wrappers.ValidationError: 1 validation error for LLMPredictStartEvent template value is not a valid dict (type=type_error.dict)

Version

0.10.46

Steps to Reproduce

graph_store = Neo4jPropertyGraphStore(
    username=os.getenv('GRAPH_USERNAME'),
    password=os.getenv('GRAPH_PASSWORD'),
    url=os.getenv('GRAPH_URI'),
)

index = PropertyGraphIndex.from_existing(
    property_graph_store=graph_store,
    embed_kg_nodes=False,
)

cypher_retriever = CustomTextToCypherRetriever(
    index.property_graph_store,
    # customize the LLM, defaults to Settings.llm
    llm=OpenAI(model="gpt-3.5-turbo-0125", temperature=0.2),
    # customize the text-to-cypher template.
    # Requires `schema` and `question` template args
    text_to_cypher_template=DEFAULT_TEXT_TO_CYPHER_TEMPLATE,
    # customize how the cypher result is inserted into
    # a text node. Requires `query` and `response` template args
    response_template=DEFAULT_RESPONSE_TEMPLATE,
    # an optional callable that can clean/verify generated cypher
    cypher_validator=None,
    # allowed fields in the resulting
    allowed_output_field=DEFAULT_ALLOWED_FIELDS,
)
retriever = index.as_retriever(
    include_text=False,
    sub_retrievers=[cypher_retriever]  # include source text, default True
)

retriever = PGRetriever(sub_retrievers=[text_to_cypher])

if __name__ == '__main__':
    nodes = retriever.retrieve("query")
    pprint(nodes)

Relevant Logs/Tracbacks

File "llama_index/core/indices/property_graph/sub_retrievers/text_to_cypher.py", line 131, in aretrieve_from_graph
    response = await self.llm.apredict(
               ^^^^^^^^^^^^^^^^^^^^^^^^
  File "llama_index/core/instrumentation/dispatcher.py", line 255, in async_wrapper
    result = await func(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "llama_index/core/llms/llm.py", line 517, in apredict
    LLMPredictStartEvent(template=prompt, template_args=prompt_args)
  File "pydantic/v1/main.py", line 341, in __init__
    raise validation_error
pydantic.v1.error_wrappers.ValidationError: 1 validation error for LLMPredictStartEvent
template
  value is not a valid dict (type=type_error.dict)
dosubot[bot] commented 5 months ago

To resolve the pydantic.v1.error_wrappers.ValidationError: 1 validation error for LLMPredictStartEvent error when using TextToCypherRetriever with an existing Neo4j graph, you need to ensure that the data being passed to the TextToCypherRetriever is correctly validated according to the expected schema. This error typically occurs when the data does not conform to the expected structure defined by Pydantic models.

Here is an example of how to correctly set up and use the TextToCypherRetriever:

from llama_index.core.indices.property_graph import TextToCypherRetriever
from llama_index.core.llms import LLM
from llama_index.core.prompts import PromptTemplate
from llama_index.core.settings import Settings

DEFAULT_RESPONSE_TEMPLATE = (
    "Generated Cypher query:\n{query}\n\n" "Cypher Response:\n{response}"
)
DEFAULT_ALLOWED_FIELDS = ["text", "label", "type"]

cypher_retriever = TextToCypherRetriever(
    graph_store=index.property_graph_store,
    llm=llm,  # customize the LLM, defaults to Settings.llm
    text_to_cypher_template=PromptTemplate(index.property_graph_store.text_to_cypher_template),
    response_template=DEFAULT_RESPONSE_TEMPLATE,
    cypher_validator=None,  # an optional callable that can clean/verify generated cypher
    allowed_output_fields=DEFAULT_ALLOWED_FIELDS,
)

Ensure that the llm and other parameters are correctly defined and that the data being passed matches the expected schema. If the error persists, you may need to inspect the specific validation error message to identify which field is causing the issue and adjust your data accordingly. Additionally, make sure that the graph_store supports structured queries, as the TextToCypherRetriever requires this capability [1][2].

To continue talking to Dosu, mention @dosu.

logan-markewich commented 5 months ago

@hoangpnhat seems like DEFAULT_TEXT_TO_CYPHER_TEMPLATE is not a PromptTemplate object. How did you define it? Or if you print it, what is it?

lizeyubuaa commented 4 months ago

i met the problem too, do you get an answer now?

lizeyubuaa commented 4 months ago

@dosu how to define DEFAULT_TEXT_TO_CYPHER_TEMPLATE ?

lizeyubuaa commented 4 months ago

i fix the problem as following code:

` DEFAULT_RESPONSE_TEMPLATE = ( "Generated Cypher query:\n{query}\n\n" "Cypher Response:\n{response}" ) DEFAULT_ALLOWED_FIELDS = ["text", "label", "type"]

DEFAULT_TEXT_TO_CYPHER_TEMPLATE = index.property_graph_store.text_to_cypher_template,

cypher_retriever = TextToCypherRetriever( index.property_graph_store,

customize the LLM, defaults to Settings.llm

# customize the text-to-cypher template.
# Requires `schema` and `question` template args
text_to_cypher_template=index.property_graph_store.text_to_cypher_template,
# customize how the cypher result is inserted into
# a text node. Requires `query` and `response` template args
response_template=DEFAULT_RESPONSE_TEMPLATE,
# an optional callable that can clean/verify generated cypher
cypher_validator=None,
# allowed fields in the resulting
allowed_output_field=DEFAULT_ALLOWED_FIELDS,

) ` so weired, bug appears when text_to_cypher_template=DEFAULT_TEXT_TO_CYPHER_TEMPLATE . the document may be wrong(https://docs.llamaindex.ai/en/latest/module_guides/indexing/lpg_index_guide/#schemallmpathextractor) @logan-markewich

rk68 commented 4 months ago

@lizeyubuaa I run your fix but now encounter the following issue, do you perhaps have any idea?

Here is the code:

`

from llama_index.core.indices.property_graph import TextToCypherRetriever

DEFAULT_RESPONSE_TEMPLATE = ( "Generated Cypher query:\n{query}\n\n" "Cypher Response:\n{response}" ) DEFAULT_ALLOWED_FIELDS = ["text", "label", "type"]

index = PropertyGraphIndex.from_existing( property_graph_store=graph_store, llm=llm_gpt4o, embed_model=embed_model, ) DEFAULT_TEXT_TO_CYPHER_TEMPLATE = index.property_graph_store.text_to_cypher_template,

cypher_retriever = TextToCypherRetriever( index.property_graph_store,

customize the LLM, defaults to Settings.llm

llm=llm_gpt4o,
# customize the text-to-cypher template.
# Requires `schema` and `question` template args
text_to_cypher_template=index.property_graph_store.text_to_cypher_template,
# customize how the cypher result is inserted into
# a text node. Requires `query` and `response` template args
response_template=DEFAULT_RESPONSE_TEMPLATE
# an optional callable that can clean/verify generated cypher
#cypher_validator=None,
# allowed fields in the resulting
#allowed_output_field=DEFAULT_ALLOWED_FIELDS

) from llama_index.core.query_engine import RetrieverQueryEngine

query_engine = RetrieverQueryEngine.from_args( cypher_retriever, )

response = query_engine.query( "Tell me about a bias audit?", ) print(str(response)) `

Here is the error encountered.

`--------------------------------------------------------------------------- UnboundLocalError Traceback (most recent call last) /Users/rishi/Documents/MSc DSML/MSc Project/ProjectFiles/RAG_Files/code/experiment_notebooks/knowledge_graph.ipynb Cell 9 line 3 32 from llama_index.core.query_engine import RetrieverQueryEngine 34 query_engine = RetrieverQueryEngine.from_args( 35 cypher_retriever, 36 ) ---> 38 response = query_engine.query( 39 "Tell me about a bias audit?", 40 ) 41 print(str(response))

File ~/Documents/MSc DSML/MSc Project/ProjectFiles/RAG_Files/rag1/lib/python3.10/site-packages/llama_index/core/instrumentation/dispatcher.py:230, in Dispatcher.span..wrapper(func, instance, args, kwargs) 226 self.spanenter( 227 id=id_, bound_args=bound_args, instance=instance, parent_id=parent_id 228 ) 229 try: --> 230 result = func(*args, **kwargs) 231 except BaseException as e: 232 self.event(SpanDropEvent(spanid=id, err_str=str(e)))

File ~/Documents/MSc DSML/MSc Project/ProjectFiles/RAG_Files/rag1/lib/python3.10/site-packages/llama_index/core/base/base_query_engine.py:52, in BaseQueryEngine.query(self, str_or_query_bundle) 50 if isinstance(str_or_query_bundle, str): 51 str_or_query_bundle = QueryBundle(str_or_query_bundle) ---> 52 query_result = self._query(str_or_query_bundle) 53 dispatcher.event( 54 QueryEndEvent(query=str_or_query_bundle, response=query_result) 55 ) 56 return query_result

File ~/Documents/MSc DSML/MSc Project/ProjectFiles/RAG_Files/rag1/lib/python3.10/site-packages/llama_index/core/instrumentation/dispatcher.py:230, in Dispatcher.span..wrapper(func, instance, args, kwargs) 226 self.spanenter( 227 id=id_, bound_args=bound_args, instance=instance, parent_id=parent_id 228 ) 229 try: --> 230 result = func(*args, **kwargs) 231 except BaseException as e: 232 self.event(SpanDropEvent(spanid=id, err_str=str(e)))

File ~/Documents/MSc DSML/MSc Project/ProjectFiles/RAG_Files/rag1/lib/python3.10/site-packages/llama_index/core/query_engine/retriever_query_engine.py:189, in RetrieverQueryEngine._query(self, query_bundle) 185 """Answer a query.""" 186 with self.callback_manager.event( 187 CBEventType.QUERY, payload={EventPayload.QUERY_STR: query_bundle.query_str} 188 ) as query_event: --> 189 nodes = self.retrieve(query_bundle) 190 response = self._response_synthesizer.synthesize( 191 query=query_bundle, 192 nodes=nodes, 193 ) 194 query_event.on_end(payload={EventPayload.RESPONSE: response})

File ~/Documents/MSc DSML/MSc Project/ProjectFiles/RAG_Files/rag1/lib/python3.10/site-packages/llama_index/core/query_engine/retriever_query_engine.py:144, in RetrieverQueryEngine.retrieve(self, query_bundle) 143 def retrieve(self, query_bundle: QueryBundle) -> List[NodeWithScore]: --> 144 nodes = self._retriever.retrieve(query_bundle) 145 return self._apply_node_postprocessors(nodes, query_bundle=query_bundle)

File ~/Documents/MSc DSML/MSc Project/ProjectFiles/RAG_Files/rag1/lib/python3.10/site-packages/llama_index/core/instrumentation/dispatcher.py:230, in Dispatcher.span..wrapper(func, instance, args, kwargs) 226 self.spanenter( 227 id=id_, bound_args=bound_args, instance=instance, parent_id=parent_id 228 ) 229 try: --> 230 result = func(*args, **kwargs) 231 except BaseException as e: 232 self.event(SpanDropEvent(spanid=id, err_str=str(e)))

File ~/Documents/MSc DSML/MSc Project/ProjectFiles/RAG_Files/rag1/lib/python3.10/site-packages/llama_index/core/base/base_retriever.py:243, in BaseRetriever.retrieve(self, str_or_query_bundle) 238 with self.callback_manager.as_trace("query"): 239 with self.callback_manager.event( 240 CBEventType.RETRIEVE, 241 payload={EventPayload.QUERY_STR: query_bundle.query_str}, 242 ) as retrieve_event: --> 243 nodes = self._retrieve(query_bundle) 244 nodes = self._handle_recursive_retrieval(query_bundle, nodes) 245 retrieve_event.on_end( 246 payload={EventPayload.NODES: nodes}, 247 )

File ~/Documents/MSc DSML/MSc Project/ProjectFiles/RAG_Files/rag1/lib/python3.10/site-packages/llama_index/core/instrumentation/dispatcher.py:230, in Dispatcher.span..wrapper(func, instance, args, kwargs) 226 self.spanenter( 227 id=id_, bound_args=bound_args, instance=instance, parent_id=parent_id 228 ) 229 try: --> 230 result = func(*args, **kwargs) 231 except BaseException as e: 232 self.event(SpanDropEvent(spanid=id, err_str=str(e)))

File ~/Documents/MSc DSML/MSc Project/ProjectFiles/RAG_Files/rag1/lib/python3.10/site-packages/llama_index/core/indices/property_graph/sub_retrievers/base.py:133, in BasePGRetriever._retrieve(self, query_bundle) 132 def _retrieve(self, query_bundle: QueryBundle) -> List[NodeWithScore]: --> 133 nodes = self.retrieve_from_graph(query_bundle) 134 if self.include_text: 135 nodes = self.add_source_text(nodes)

File ~/Documents/MSc DSML/MSc Project/ProjectFiles/RAG_Files/rag1/lib/python3.10/site-packages/llama_index/core/instrumentation/dispatcher.py:230, in Dispatcher.span..wrapper(func, instance, args, kwargs) 226 self.spanenter( 227 id=id_, bound_args=bound_args, instance=instance, parent_id=parent_id 228 ) 229 try: --> 230 result = func(*args, **kwargs) 231 except BaseException as e: 232 self.event(SpanDropEvent(spanid=id, err_str=str(e)))

File ~/Documents/MSc DSML/MSc Project/ProjectFiles/RAG_Files/rag1/lib/python3.10/site-packages/llama_index/core/indices/property_graph/sub_retrievers/text_to_cypher.py:107, in TextToCypherRetriever.retrieve_from_graph(self, query_bundle) 104 if self.allowed_output_fields is not None: 105 parsed_cypher_query = self._parse_generated_cyher(response) --> 107 query_output = self._graph_store.structured_query(parsed_cypher_query) 109 cleaned_query_output = self._clean_query_output(query_output) 111 node_text = self.response_template.format( 112 query=parsed_cypher_query, 113 response=str(cleaned_query_output), 114 )

UnboundLocalError: local variable 'parsed_cypher_query' referenced before assignment`

lizeyubuaa commented 4 months ago

@lizeyubuaa I run your fix but now encounter the following issue, do you perhaps have any idea?

Here is the code:

`

from llama_index.core.indices.property_graph import TextToCypherRetriever

DEFAULT_RESPONSE_TEMPLATE = ( "Generated Cypher query:\n{query}\n\n" "Cypher Response:\n{response}" ) DEFAULT_ALLOWED_FIELDS = ["text", "label", "type"]

index = PropertyGraphIndex.from_existing( property_graph_store=graph_store, llm=llm_gpt4o, embed_model=embed_model, ) DEFAULT_TEXT_TO_CYPHER_TEMPLATE = index.property_graph_store.text_to_cypher_template,

cypher_retriever = TextToCypherRetriever( index.property_graph_store, # customize the LLM, defaults to Settings.llm llm=llm_gpt4o, # customize the text-to-cypher template. # Requires schema and question template args text_to_cypher_template=index.property_graph_store.text_to_cypher_template, # customize how the cypher result is inserted into # a text node. Requires query and response template args response_template=DEFAULT_RESPONSE_TEMPLATE # an optional callable that can clean/verify generated cypher #cypher_validator=None, # allowed fields in the resulting #allowed_output_field=DEFAULT_ALLOWED_FIELDS ) from llama_index.core.query_engine import RetrieverQueryEngine

query_engine = RetrieverQueryEngine.from_args( cypher_retriever, )

response = query_engine.query( "Tell me about a bias audit?", ) print(str(response)) `

Here is the error encountered.

`--------------------------------------------------------------------------- UnboundLocalError Traceback (most recent call last) /Users/rishi/Documents/MSc DSML/MSc Project/ProjectFiles/RAG_Files/code/experiment_notebooks/knowledge_graph.ipynb Cell 9 line 3 32 from llama_index.core.query_engine import RetrieverQueryEngine 34 query_engine = RetrieverQueryEngine.from_args( 35 cypher_retriever, 36 ) ---> 38 response = query_engine.query( 39 "Tell me about a bias audit?", 40 ) 41 print(str(response))

File ~/Documents/MSc DSML/MSc Project/ProjectFiles/RAG_Files/rag1/lib/python3.10/site-packages/llama_index/core/instrumentation/dispatcher.py:230, in Dispatcher.span..wrapper(func, instance, args, kwargs) 226 self.spanenter( 227 id=id_, bound_args=bound_args, instance=instance, parent_id=parent_id 228 ) 229 try: --> 230 result = func(*args, **kwargs) 231 except BaseException as e: 232 self.event(SpanDropEvent(spanid=id, err_str=str(e)))

File ~/Documents/MSc DSML/MSc Project/ProjectFiles/RAG_Files/rag1/lib/python3.10/site-packages/llama_index/core/base/base_query_engine.py:52, in BaseQueryEngine.query(self, str_or_query_bundle) 50 if isinstance(str_or_query_bundle, str): 51 str_or_query_bundle = QueryBundle(str_or_query_bundle) ---> 52 query_result = self._query(str_or_query_bundle) 53 dispatcher.event( 54 QueryEndEvent(query=str_or_query_bundle, response=query_result) 55 ) 56 return query_result

File ~/Documents/MSc DSML/MSc Project/ProjectFiles/RAG_Files/rag1/lib/python3.10/site-packages/llama_index/core/instrumentation/dispatcher.py:230, in Dispatcher.span..wrapper(func, instance, args, kwargs) 226 self.spanenter( 227 id=id_, bound_args=bound_args, instance=instance, parent_id=parent_id 228 ) 229 try: --> 230 result = func(*args, **kwargs) 231 except BaseException as e: 232 self.event(SpanDropEvent(spanid=id, err_str=str(e)))

File ~/Documents/MSc DSML/MSc Project/ProjectFiles/RAG_Files/rag1/lib/python3.10/site-packages/llama_index/core/query_engine/retriever_query_engine.py:189, in RetrieverQueryEngine._query(self, query_bundle) 185 """Answer a query.""" 186 with self.callback_manager.event( 187 CBEventType.QUERY, payload={EventPayload.QUERY_STR: query_bundle.query_str} 188 ) as query_event: --> 189 nodes = self.retrieve(query_bundle) 190 response = self._response_synthesizer.synthesize( 191 query=query_bundle, 192 nodes=nodes, 193 ) 194 query_event.on_end(payload={EventPayload.RESPONSE: response})

File ~/Documents/MSc DSML/MSc Project/ProjectFiles/RAG_Files/rag1/lib/python3.10/site-packages/llama_index/core/query_engine/retriever_query_engine.py:144, in RetrieverQueryEngine.retrieve(self, query_bundle) 143 def retrieve(self, query_bundle: QueryBundle) -> List[NodeWithScore]: --> 144 nodes = self._retriever.retrieve(query_bundle) 145 return self._apply_node_postprocessors(nodes, query_bundle=query_bundle)

File ~/Documents/MSc DSML/MSc Project/ProjectFiles/RAG_Files/rag1/lib/python3.10/site-packages/llama_index/core/instrumentation/dispatcher.py:230, in Dispatcher.span..wrapper(func, instance, args, kwargs) 226 self.spanenter( 227 id=id_, bound_args=bound_args, instance=instance, parent_id=parent_id 228 ) 229 try: --> 230 result = func(*args, **kwargs) 231 except BaseException as e: 232 self.event(SpanDropEvent(spanid=id, err_str=str(e)))

File ~/Documents/MSc DSML/MSc Project/ProjectFiles/RAG_Files/rag1/lib/python3.10/site-packages/llama_index/core/base/base_retriever.py:243, in BaseRetriever.retrieve(self, str_or_query_bundle) 238 with self.callback_manager.as_trace("query"): 239 with self.callback_manager.event( 240 CBEventType.RETRIEVE, 241 payload={EventPayload.QUERY_STR: query_bundle.query_str}, 242 ) as retrieve_event: --> 243 nodes = self._retrieve(query_bundle) 244 nodes = self._handle_recursive_retrieval(query_bundle, nodes) 245 retrieve_event.on_end( 246 payload={EventPayload.NODES: nodes}, 247 )

File ~/Documents/MSc DSML/MSc Project/ProjectFiles/RAG_Files/rag1/lib/python3.10/site-packages/llama_index/core/instrumentation/dispatcher.py:230, in Dispatcher.span..wrapper(func, instance, args, kwargs) 226 self.spanenter( 227 id=id_, bound_args=bound_args, instance=instance, parent_id=parent_id 228 ) 229 try: --> 230 result = func(*args, **kwargs) 231 except BaseException as e: 232 self.event(SpanDropEvent(spanid=id, err_str=str(e)))

File ~/Documents/MSc DSML/MSc Project/ProjectFiles/RAG_Files/rag1/lib/python3.10/site-packages/llama_index/core/indices/property_graph/sub_retrievers/base.py:133, in BasePGRetriever._retrieve(self, query_bundle) 132 def _retrieve(self, query_bundle: QueryBundle) -> List[NodeWithScore]: --> 133 nodes = self.retrieve_from_graph(query_bundle) 134 if self.include_text: 135 nodes = self.add_source_text(nodes)

File ~/Documents/MSc DSML/MSc Project/ProjectFiles/RAG_Files/rag1/lib/python3.10/site-packages/llama_index/core/instrumentation/dispatcher.py:230, in Dispatcher.span..wrapper(func, instance, args, kwargs) 226 self.spanenter( 227 id=id_, bound_args=bound_args, instance=instance, parent_id=parent_id 228 ) 229 try: --> 230 result = func(*args, **kwargs) 231 except BaseException as e: 232 self.event(SpanDropEvent(spanid=id, err_str=str(e)))

File ~/Documents/MSc DSML/MSc Project/ProjectFiles/RAG_Files/rag1/lib/python3.10/site-packages/llama_index/core/indices/property_graph/sub_retrievers/text_to_cypher.py:107, in TextToCypherRetriever.retrieve_from_graph(self, query_bundle) 104 if self.allowed_output_fields is not None: 105 parsed_cypher_query = self._parse_generated_cyher(response) --> 107 query_output = self._graph_store.structured_query(parsed_cypher_query) 109 cleaned_query_output = self._clean_query_output(query_output) 111 node_text = self.response_template.format( 112 query=parsed_cypher_query, 113 response=str(cleaned_query_output), 114 )

UnboundLocalError: local variable 'parsed_cypher_query' referenced before assignment`

i compare our codes, this problem may caused by the following 2: 1.[low possibilitiy] query_engine = RetrieverQueryEngine.from_args(index.as_retriever([cypher_retriever])) 2.[mid possibilitiy] library version. as far as i am concerned, this library may have bug in different version, here is my environment: image *** llama-index-core 0.10.42 hope to help you!