now showing query field when trying to retrieve the documents using SelfQueryRetriver

langchain-ai / langchain

🦜🔗 Build context-aware reasoning applications

MIT License

94.89k stars 15.37k forks source link

pdf_file = '/content/documents/Pre-proposal students.pdf' # Define your prompt template prompt_template = """Use the following pieces of information to answer the user's question. If you don't know the answer, just say that you don't know, don't try to make up an answer. Context: {context} Question: {question} Only return the helpful answer below and nothing else. If no context, then no answer. Helpful Answer:""" # Load the PDF file loader = PyPDFLoader(pdf_file) document = loader.load() text_splitter = RecursiveCharacterTextSplitter( chunk_size=1000, chunk_overlap=200) # Split the document into chunks texts = text_splitter.split_documents(document) vectorstore = Chroma.from_documents(texts, embeddings) llm = OpenAI(temperature=0) # Create a retriever for the vector database document_content_description = "Description of research papers and research proposal" metadata_field_info = [ AttributeInfo( name="title", description="The title of the research paper.", type="string", ), AttributeInfo( name="institution", description="The name of the institution or university associated with the research.", type="string", ), AttributeInfo( name="year", description="The year the research was published.", type="integer", ), AttributeInfo( name="abstract", description="A brief summary of the research paper.", type="string", ), AttributeInfo( name="methodology", description="The main research methods used in the study.", type="string", ), AttributeInfo( name="findings", description="A brief description of the main findings of the research.", type="string", ), AttributeInfo( name="implications", description="The implications of the research findings.", type="string", ), AttributeInfo( name="reference_count", description="The number of references cited in the research paper.", type="integer", ), AttributeInfo( name="doi", description="The Digital Object Identifier for the research paper.", type="string", ), ] retriever = SelfQueryRetriever.from_llm( llm, vectorstore, document_content_description, metadata_field_info, enable_limit=True, verbose=True ) # retriever.get_relevant_documents("What is the title of the proposal") # logging.basicConfig(level=logging.INFO) # Create a chain to answer questions qa_chain = RetrievalQA.from_chain_type(llm=llm, chain_type="stuff", retriever=retriever, return_source_documents=True) retriever.get_relevant_documents("main research method")

🤖

Hey @nithinreddyyyyyy! Great to see you around here again. How's everything going with your projects? 😊

The reason you're not seeing the query field in your output is due to the use_original_query attribute in the SelfQueryRetriever class. If use_original_query is set to False, the original query will not be used. Instead, a new query will be generated by the structured_query_translator from the StructuredQuery object.

Here's the relevant code from the SelfQueryRetriever class:

def _prepare_query(
        self, query: str, structured_query: StructuredQuery
    ) -> Tuple[str, Dict[str, Any]]:
        new_query, new_kwargs = self.structured_query_translator.visit_structured_query(
            structured_query
        )
        if structured_query.limit is not None:
            new_kwargs["k"] = structured_query.limit
        if self.use_original_query:
            new_query = query
        search_kwargs = {**self.search_kwargs, **new_kwargs}
        return new_query, search_kwargs

In this method, new_query and new_kwargs are generated by the structured_query_translator visiting the structured_query. If self.use_original_query is True, new_query is replaced by the original query. If self.use_original_query is False, the original query is not used, and new_query remains as the query generated by the structured_query_translator.

To use the original query, you should set use_original_query to True when creating an instance of SelfQueryRetriever. You can find this in the LangChain repository.

Also, the default value of the use_original_query attribute in the SelfQueryRetriever class in LangChain is False. This is defined in the SelfQueryRetriever class definition:

class SelfQueryRetriever(BaseRetriever):
    ...
    use_original_query: bool = False
    """Use original query instead of the revised new query from LLM"""
    ...

You can also find this in the LangChain repository.

I hope this helps! If you have any other questions, feel free to ask.

Sources

libs/langchain/langchain/retrievers/self_query/base.py

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

langchain-ai / langchain

now showing query field when trying to retrieve the documents using SelfQueryRetriver #17040

Issue with current documentation:

Idea or request for content:

Sources