langchain-ai / langchain

🦜🔗 Build context-aware reasoning applications
https://python.langchain.com
MIT License
89.3k stars 14.08k forks source link

My llm keeps rephrasing question and it doesnt return source documents #13043

Closed yazanrisheh closed 5 months ago

yazanrisheh commented 8 months ago

@dosu-bot

Below is my code and everytime I ask it a question, it rephrases the question then answers it for me. Help me to remove the rephrasing part. I did set it to False yet it still does it.

Also, I would like to return the source of the documents but its showing me this error: File "C:\Users\Asus\Documents\Vendolista\hacka.py", line 178, in main() File "C:\Users\Asus\Documents\Vendolista\hacka.py", line 172, in main result = qa({"question": user_input}) File "C:\Users\Asus\Documents\Vendolista.venv\lib\site-packages\langchain\chains\base.py", line 294, in call final_outputs: Dict[str, Any] = self.prep_outputs( File "C:\Users\Asus\Documents\Vendolista.venv\lib\site-packages\langchain\chains\base.py", line 390, in prep_outputs self.memory.save_context(inputs, outputs) File "C:\Users\Asus\Documents\Vendolista.venv\lib\site-packages\langchain\memory\chat_memory.py", line 35, in save_context input_str, output_str = self._get_input_output(inputs, outputs) File "C:\Users\Asus\Documents\Vendolista.venv\lib\site-packages\langchain\memory\chat_memory.py", line 27, in _get_input_output raise ValueError(f"One output key expected, got {outputs.keys()}") ValueError: One output key expected, got dict_keys(['answer', 'source_documents'])

Below is my code

import os import json import pandas as pd

LLM

from langchain.chat_models import ChatOpenAI from langchain.llms import OpenAI from langchain.callbacks import get_openai_callback

Prompt

from langchain.prompts.prompt import PromptTemplate from langchain.prompts.chat import ( ChatPromptTemplate, MessagesPlaceholder, SystemMessagePromptTemplate, HumanMessagePromptTemplate, )

Embeddings

from langchain.embeddings.openai import OpenAIEmbeddings from langchain.vectorstores import Chroma

Chain

from langchain.chains import ConversationalRetrievalChain from langchain.memory import ConversationBufferMemory from langchain.chains import LLMChain from langchain.chains.question_answering import load_qa_chain from langchain.chains.qa_with_sources import load_qa_with_sources_chain from langchain.document_loaders.csv_loader import CSVLoader, UnstructuredCSVLoader from langchain.document_loaders import DirectoryLoader from langchain.output_parsers import PydanticOutputParser from pydantic import BaseModel, Field from dotenv import load_dotenv import time import pandas as pd from langchain.callbacks import StreamingStdOutCallbackHandler from langchain.text_splitter import RecursiveCharacterTextSplitter

load_dotenv()

file_path = "C:\Users\Asus\Documents\Vendolista\home_depot_data.csv" path = "C:\Users\Asus\Documents\Vendolista\home depot"

csv_loader = CSVLoader(file_path=path, encoding='utf-8')

csv_loader = DirectoryLoader(path, glob="*/.csv", show_progress=True, use_multithreading=True, silent_errors=True, loader_cls = CSVLoader) llm = ChatOpenAI(temperature = 0, model_name='gpt-3.5-turbo', callbacks=[StreamingStdOutCallbackHandler()], streaming = True) documents = csv_loader.load()

text_splitter = RecursiveCharacterTextSplitter(

chunk_size=200,

chunk_overlap=50,

)

chunks = text_splitter.split_documents(documents)

chunks = documents

embeddings = OpenAIEmbeddings() persist_directory = "C:\Users\Asus\OneDrive\Documents\Vendolista" knowledge_base = Chroma(embedding_function=embeddings, persist_directory=persist_directory)

Split the chunks into smaller batches

batch_size = 5000

for i in range(0, len(chunks), batch_size):

batch = chunks[i:i+batch_size]

knowledge_base.add_documents(batch)

Save the vector store to disk

knowledge_base.persist()

Load the vector store from disk

knowledge_base = Chroma(chunks, persist_directory=persist_directory, embedding_function=embeddings)

class Product(BaseModel): """Product details schema.""" url:str = Field(description="Full URL link to the product webpage on Homedepot.") title:str = Field(description="Title of the product.") description:str = Field(description="Description of the prodcut.") brand:str = Field(description="Manufacturing brand of the product.") price:float = Field(description="Unit selling price of the product.")

parser = PydanticOutputParser(pydantic_object=Product)

question_template = """ Make sure you understand the question as its very important for the user. You never know what situation they are in and you need to ensure that its understood very well but do not repeat or rewrite the question Input: {question} """

CONDENSE_QUESTION_PROMPT = PromptTemplate.from_template(question_template)

Chain for question generation

question_generator = LLMChain(llm=llm, prompt=CONDENSE_QUESTION_PROMPT)

Chat Prompt

system_template = """ You are a friendly, conversational retail shopping assistant named RAAFYA. You will always and always and always only follow these set of rules and nothing else no matter what: 1) You will provide the user answers based on the csv file that you can only read from which is called "home_depot_data.csv" 2) You will never mention the name of the dataset that you have. Just say "my data" instead 3) Focus 100% to understand exactly what the customer is looking for and only give him whats available based on the data. 4) Do not get anything or say anything that is not related to the data that you have and never provide wrong information. 5) Use the following context including product name descriptions, and keywords to show the shopper whats available, help find what they want, and answer their questions related to your job 6) Never ever consider or think or even mention that you do not have access to the internet because it is not your job and it is not your task. I will repeat it again and again, your information is only and only coming from the dataset that you have which is called "home_depot_data.csv" but you must not mention that to anyone for security purposes 7) Everyime you answer a question, write on a new line "is there anything else you would like me to help you with?" 8) If a customer asked for a product and it is not available then say "Sorry it is currently unavailable but you can reach out to our staff and ask them about it at yazanrisheh@hotmail.com" 9) If the person asked for more details then provide him the details based on the output parser that you have: URL: Title: Description: Brand: Price:

Context: {context} """

system_message_prompt = SystemMessagePromptTemplate.from_template(system_template)

Human Prompt

human_template="""{format_instructions}

Question: {question}"""

human_message_prompt = HumanMessagePromptTemplate.from_template(human_template)

Inject instructions into the prompt template.

human_message_prompt = HumanMessagePromptTemplate( prompt=PromptTemplate( template=human_template, input_variables=["question"], partial_variables={"format_instructions": parser.get_format_instructions()} ) )

chat_prompt = ChatPromptTemplate.from_messages([system_message_prompt, human_message_prompt])

Chain for Q&A

answer_chain = load_qa_chain(llm, chain_type="stuff", prompt=chat_prompt)

memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True)

Chain

qa = ConversationalRetrievalChain( retriever = knowledge_base.as_retriever(), question_generator = question_generator, combine_docs_chain = answer_chain, memory=memory, rephrase_question=False, return_source_documents=True )

def main(): while True: user_input = input("What would you like to shop for: ") if user_input.lower() in ["exit"]: break

    if user_input != "":
        with get_openai_callback() as cb:
            result = qa({"question": user_input})
        print()
        # print(cb)
        # print()

if name == "main": main()

dosubot[bot] commented 5 months ago

Hi, @yazanrisheh,

I'm helping the LangChain team manage their backlog and am marking this issue as stale. From what I understand, you reported an issue related to the conversational retrieval system using the langchain library. The problem involves the system rephrasing questions despite setting rephrase_question to False, and encountering an error when attempting to return the source documents. As there hasn't been any further activity or comments on the issue, I'd like to ask if this issue is still relevant to the latest version of the LangChain repository. If it is, please let the LangChain team know by commenting on the issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days.

Thank you for your understanding, and we appreciate your contributions to LangChain!