langchain-ai / langchain

🦜🔗 Build context-aware reasoning applications
https://python.langchain.com
MIT License
92.59k stars 14.82k forks source link

GraphCypherQAChain doesn't support returning source documents with `return_source_documents` param like the `BaseQAWithSourcesChain` chains #15543

Closed AsmaaMHadir closed 5 months ago

AsmaaMHadir commented 8 months ago

System Info

python==3.10 langchain==0.0.326 langdetect==1.0.9 langsmith==0.0.54

Who can help?

No response

Information

Related Components

Reproduction

  1. define GraphCypherQAChain chain to query over Neo4j graph db and construct cypher query then return answer to user's question while setting return_source_documents to True: ` graph = Neo4jGraph( url=NEO4J_URL, username=NEO4J_USERNAME, password=NEO4J_PASSWORD )

    EXAMPLES_PROMPT_TEMPLATE = """
    Input: {db_question}, Output: {db_query} """

    example_prompt = PromptTemplate(input_variables=["db_question", "db_query"], template=EXAMPLES_PROMPT_TEMPLATE)

    example_selector = MaxMarginalRelevanceExampleSelector.from_examples(

    This is the list of examples available to select from.

    query_examples,
    # This is the embedding class used to produce embeddings which are used to measure semantic similarity.
    OpenAIEmbeddings(),
    # This is the VectorStore class that is used to store the embeddings and do a similarity search over.
    Chroma,
    # This is the number of examples to produce.
    k=2

    )

    prompt_cypher = FewShotPromptTemplate( example_selector=example_selector, example_prompt=example_prompt, suffix="The question is:\n{question}", prefix="""Task: Generate a Cypher query to query a graph database based on the user's question. Instructions: Use the provided schema for node types, relationship types, and properties in the graph database. Only incorporate these defined elements. Avoid utilizing any other node types, relationship types, or properties not present in the provided schema. Here's the schema definition:

    {schema}

    if the question matches one of the sample questions in the knowledge base then just return the query used to answer it. if the user asks to retrieve a piece of information about a document or section given their name, then use a WHERE statement and a cypher regular expression matching without case sensitivity like in the queries in your knowledge base when filtering by the name.

    Use the statement (t:transaction)-[:CONTAINS*]->(s) in the cypher query with the * sign next to the relationship label CONTAINS and where s is the section node you are looking for.

    Ensure the generated query captures relevant information from the graph database without reducing the retrieved data due to variations in user wording. Note: do not include any explanations or apologies in your responses. Do not respond to inquiries seeking information other than the construction of a Cypher statement. Do not include any text except the generated Cypher statement. """, input_variables=["schema", "question"] )

    QA_GENERATION_TEMPLATE = """ Task: answer the question you are given based on the context provided. Instructions: You are an assistant that helps to form nice and human understandable answers. Use the context information provided to generate a well organized and comprehensve answer to the user's question. When the provided information contains multiple elements, structure your answer as a bulleted or numbered list to enhance clarity and readability. You must use the information to construct your answer. The provided information is authoritative; do not doubt it or try to use your internal knowledge to correct it. Make the answer sound like a response to the question without mentioning that you based the result on the given information.

        Here's the information:
        {context}
    
        Question: {question}
        Answer:
            """

    prompt_qa = PromptTemplate(input_variables=["context", "question"], template=QA_GENERATION_TEMPLATE)

    chain = GraphCypherQAChain.from_llm( cypher_llm=ChatOpenAI(temperature=0, model="gpt-4"), qa_llm=ChatOpenAI(temperature=0, model="gpt-4"), graph=graph, verbose=True, return_intermediate_steps=True, return_source_documents=True, validate_cypher=True, cypher_prompt=prompt_cypher, qa_prompt=prompt_qa )`

  2. run chain and retrieve source docs used to answer question: ` res = chain.run({)) # chatbot response

    answer = res['result']

    print("source_documents" in res) print(res.get("source_documents"))`

Expected behavior

output is expected to be the source files that were queried or a similar output indicating the high level graph elements that were used to construct the context passed in the prompt.

dosubot[bot] commented 8 months ago

🤖

Hello @AsmaaMHadir!

I'm Dosu, a friendly bot here to lend a helping hand with LangChain related issues, answer your queries, and guide you on your journey as a contributor. While we're waiting for a human maintainer, rest assured, I'm here to assist you. Let's dive in and see what we can do together!

Based on the information provided, it appears that the GraphCypherQAChain class in LangChain version 0.0.326 does not support returning source documents with the return_source_documents parameter because it does not have this parameter implemented in its class definition.

The GraphCypherQAChain class has the following parameters: graph, cypher_generation_chain, qa_chain, graph_schema, input_key, output_key, top_k, return_intermediate_steps, return_direct, and cypher_query_corrector. However, there is no return_source_documents parameter.

This could be due to a design decision by the developers, or it could be a feature that has not yet been implemented in this version of LangChain.

In the latest version of LangChain (v0.0.354), the GraphCypherQAChain class still does not have the return_source_documents parameter. The class has several methods including from_llm and _call, but none of these methods mention the return_source_documents parameter.

However, there is a similar solved issue in the LangChain repository: RetrievalQA chain return source_documents when using it as a Tool for an Agent. In this issue, the user was able to return source documents by using the RetrievalQA chain and setting the return_source_documents parameter to True.

Here is the code snippet from the solved issue:

chain = RetrievalQA.from_chain_type(llm=llm, chain_type='stuff', retriever=retriever, return_source_documents=True, verbose=True, input_key="question")

This might not be the exact solution you are looking for, but it could be a workaround until the return_source_documents parameter is implemented in the GraphCypherQAChain class.

I hope this information is helpful. If you have any other questions or need further clarification, please let me know.

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

AsmaaMHadir commented 8 months ago

🤖

Hello @AsmaaMHadir!

I'm Dosu, a friendly bot here to lend a helping hand with LangChain related issues, answer your queries, and guide you on your journey as a contributor. While we're waiting for a human maintainer, rest assured, I'm here to assist you. Let's dive in and see what we can do together!

Based on the information provided, it appears that the GraphCypherQAChain class in LangChain version 0.0.326 does not support returning source documents with the return_source_documents parameter because it does not have this parameter implemented in its class definition.

The GraphCypherQAChain class has the following parameters: graph, cypher_generation_chain, qa_chain, graph_schema, input_key, output_key, top_k, return_intermediate_steps, return_direct, and cypher_query_corrector. However, there is no return_source_documents parameter.

This could be due to a design decision by the developers, or it could be a feature that has not yet been implemented in this version of LangChain.

In the latest version of LangChain (v0.0.354), the GraphCypherQAChain class still does not have the return_source_documents parameter. The class has several methods including from_llm and _call, but none of these methods mention the return_source_documents parameter.

However, there is a similar solved issue in the LangChain repository: RetrievalQA chain return source_documents when using it as a Tool for an Agent. In this issue, the user was able to return source documents by using the RetrievalQA chain and setting the return_source_documents parameter to True.

Here is the code snippet from the solved issue:

chain = RetrievalQA.from_chain_type(llm=llm, chain_type='stuff', retriever=retriever, return_source_documents=True, verbose=True, input_key="question")

This might not be the exact solution you are looking for, but it could be a workaround until the return_source_documents parameter is implemented in the GraphCypherQAChain class.

I hope this information is helpful. If you have any other questions or need further clarification, please let me know.

Sources

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

Thank you for the insightful answer. How do I customize a retriever over my graph database?

dosubot[bot] commented 8 months ago

🤖

🤖

To customize a retriever for your graph database in LangChain, you can create a custom retriever class that implements the RetrieverLike interface. This interface is a type alias for a Runnable with RetrieverInput and RetrieverOutput as its input and output types respectively. The RetrieverLike interface is used as a type hint for objects that behave like a retriever, i.e., they can take a string query and return a list of relevant Document objects.

Here is a basic example of how you might implement a custom retriever for a graph database:

from langchain_core.retrievers import RetrieverLike, Document

class GraphDatabaseRetriever(RetrieverLike):
    def __init__(self, graph_database):
        self.graph_database = graph_database

    def retrieve(self, query: str) -> List[Document]:
        # Implement your retrieval logic here
        # This is a placeholder implementation
        results = self.graph_database.search(query)
        documents = [Document(title=result.title, text=result.text) for result in results]
        return documents

You can then use this retriever in the create_history_aware_retriever function:

from langchain_community.chat_models import ChatOpenAI
from langchain.chains import create_history_aware_retriever
from langchain import hub

rephrase_prompt = hub.pull("langchain-ai/chat-langchain-rephrase")
llm = ChatOpenAI()
retriever = GraphDatabaseRetriever(my_graph_database)
chat_retriever_chain = create_history_aware_retriever(
    llm, retriever, rephrase_prompt
)

Please replace my_graph_database with your actual graph database instance and implement the retrieve method in GraphDatabaseRetriever according to your needs.

Sources

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.