langchain-ai / langchain

🦜🔗 Build context-aware reasoning applications
https://python.langchain.com
MIT License
93.05k stars 14.96k forks source link

GraphCypherQAChain cannot generate correct Cypher commands #22385

Open mj-1023 opened 4 months ago

mj-1023 commented 4 months ago

Checked other resources

Example Code

from langchain_community.graphs import Neo4jGraph
from langchain.chains import GraphCypherQAChain
from langchain_openai import ChatOpenAI

uri = "bolt://localhost:7687"
username = "xxxx"
password = "xxxxx"

graph = Neo4jGraph(url=uri, username=username, password=password)

llm = ChatOpenAI(model="gpt-4-0125-preview",temperature=0)

chain = GraphCypherQAChain.from_llm(graph=graph, llm=llm, verbose=True, validate_cypher=True)

Error Message and Stack Trace (if applicable)

Entering new GraphCypherQAChain chain... Generated Cypher: MATCH (t:Tools {name: "test.py"})-[:has MD5 hash]->(h:Hash) RETURN h.name Traceback (most recent call last): File "\lib\site-packages\langchain_community\graphs\neo4j_graph.py", line 391, in query data = session.run(Query(text=query, timeout=self.timeout), params) File "\lib\site-packages\neo4j_sync\work\session.py", line 313, in run self._auto_result._run( File "\lib\site-packages\neo4j_sync\work\result.py", line 181, in _run self._attach() File "\lib\site-packages\neo4j_sync\work\result.py", line 301, in _attach self._connection.fetch_message() File "\lib\site-packages\neo4j_sync\io_common.py", line 178, in inner func(args, kwargs) File "\lib\site-packages\neo4j_sync\io_bolt.py", line 850, in fetch_message res = self._process_message(tag, fields) File "\lib\site-packages\neo4j_sync\io_bolt5.py", line 369, in _process_message response.on_failure(summary_metadata or {}) File "\lib\site-packages\neo4j_sync\io_common.py", line 245, in on_failure raise Neo4jError.hydrate(metadata) neo4j.exceptions.CypherSyntaxError: {code: Neo.ClientError.Statement.SyntaxError} {message: Invalid input 'MD5': expected "" "WHERE" "]" "{" a parameter (line 1, column 50 (offset: 49)) "MATCH (t:Tools {name: "test.py"})-[:has MD5 hash]->(h:Hash) RETURN h.name" ^}

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "\graph_RAG.py", line 29, in response = chain.invoke({"query": "What is the MD5 of test.py?"}) File "lib\site-packages\langchain\chains\base.py", line 166, in invoke raise e File "\lib\site-packages\langchain\chains\base.py", line 156, in invoke self._call(inputs, run_manager=run_manager) File "\lib\site-packages\langchain_community\chains\graph_qa\cypher.py", line 274, in _call context = self.graph.query(generated_cypher)[: self.top_k] File "\lib\site-packages\langchain_community\graphs\neo4j_graph.py", line 397, in query raise ValueError(f"Generated Cypher Statement is not valid\n{e}") ValueError: Generated Cypher Statement is not valid {code: Neo.ClientError.Statement.SyntaxError} {message: Invalid input 'MD5': expected "*" "WHERE" "]" "{" a parameter (line 1, column 50 (offset: 49)) "MATCH (t:Tools {name: "test.py"})-[:has MD5 hash]->(h:Hash) RETURN h.name" ^}

Description

Following the tutorial https://python.langchain.com/v0.2/docs/tutorials/graph/, the relationships between entities in my neo4j database contain spaces, and the agent cannot handle this situation correctly.

System Info

System Information

OS: Windows OS Version: 10.0.19045 Python Version: 3.10.10 | packaged by Anaconda, Inc. | (main, Mar 21 2023, 18:39:17) [MSC v.1916 64 bit (AMD64)]

Package Information

langchain_core: 0.2.3 langchain: 0.2.1 langchain_community: 0.2.1 langsmith: 0.1.67 langchain_openai: 0.1.8 langchain_text_splitters: 0.2.0

Packages not installed (Not Necessarily a Problem)

The following packages were not found:

langgraph langserve

RafaelXokito commented 3 months ago

The case you're describing seems to be an edge case that occurs when your schema contains spaces in node labels or relationship types. This can be solved using the cypher_llm_kwargs argument in the from_llm class method.

The default prompt is a generic one that works in most scenarios. However, it should be concise while maintaining optimal results. Try the following prompt to handle this specific edge case:

cypher_prompt_template = """Task:Generate Cypher statement to query a graph database.
Instructions:
Use only the provided relationship types and properties in the schema.
Do not use any other relationship types or properties that are not provided.
Schema:
{schema}
Note: Do not include any explanations or apologies in your responses.
Do not respond to any questions that might ask anything else than for you to construct a Cypher statement.
Do not include any text except the generated Cypher statement.

Examples:

Example 1:
Schema: {Person: {name, age}, City: {name}, LIVES IN: {since}}
Question: Find all persons living in New York.
Output: MATCH (p:`Person`)-[:`LIVES IN`]->(c:`City` {name: 'New York'}) RETURN p

Example 2:
Schema: {Employee: {name, role}, Department: {name}, WORKS_IN: {since}}
Question: Retrieve the names of all employees working in the Sales department.
Output: MATCH (e:`Employee`)-[:`WORKS_IN`]->(d:`Department` {name: 'Sales'}) RETURN e.name

Example 3:
Schema: {Movie: {title, releaseYear}, Director: {name}, DIRECTED: {since}}
Question: List all movies directed by Christopher Nolan.
Output: MATCH (d:`Director` {name: 'Christopher Nolan'})-[:`DIRECTED`]->(m:`Movie`) RETURN m.title

The question is:
{question}"""

cypher_prompt = PromptTemplate(template=cypher_prompt_template, input_variables=[])
chain = GraphCypherQAChain.from_llm(graph=graph, llm=llm, verbose=True,
                    validate_cypher=True, cypher_llm_kwargs={"prompt": cypher_prompt})

Try this approach and see if it resolves your issue.