Closed yazanrisheh closed 5 months ago
Hi, @yazanrisheh,
I'm helping the LangChain team manage their backlog and am marking this issue as stale. From what I understand, you reported an issue related to the conversational retrieval system using the langchain library. The problem involves the system rephrasing questions despite setting rephrase_question to False, and encountering an error when attempting to return the source documents. As there hasn't been any further activity or comments on the issue, I'd like to ask if this issue is still relevant to the latest version of the LangChain repository. If it is, please let the LangChain team know by commenting on the issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days.
Thank you for your understanding, and we appreciate your contributions to LangChain!
@dosu-bot
Below is my code and everytime I ask it a question, it rephrases the question then answers it for me. Help me to remove the rephrasing part. I did set it to False yet it still does it.
Also, I would like to return the source of the documents but its showing me this error: File "C:\Users\Asus\Documents\Vendolista\hacka.py", line 178, in
main()
File "C:\Users\Asus\Documents\Vendolista\hacka.py", line 172, in main
result = qa({"question": user_input})
File "C:\Users\Asus\Documents\Vendolista.venv\lib\site-packages\langchain\chains\base.py", line 294, in call
final_outputs: Dict[str, Any] = self.prep_outputs(
File "C:\Users\Asus\Documents\Vendolista.venv\lib\site-packages\langchain\chains\base.py", line 390, in prep_outputs
self.memory.save_context(inputs, outputs)
File "C:\Users\Asus\Documents\Vendolista.venv\lib\site-packages\langchain\memory\chat_memory.py", line 35, in save_context
input_str, output_str = self._get_input_output(inputs, outputs)
File "C:\Users\Asus\Documents\Vendolista.venv\lib\site-packages\langchain\memory\chat_memory.py", line 27, in _get_input_output
raise ValueError(f"One output key expected, got {outputs.keys()}")
ValueError: One output key expected, got dict_keys(['answer', 'source_documents'])
Below is my code
import os import json import pandas as pd
LLM
from langchain.chat_models import ChatOpenAI from langchain.llms import OpenAI from langchain.callbacks import get_openai_callback
Prompt
from langchain.prompts.prompt import PromptTemplate from langchain.prompts.chat import ( ChatPromptTemplate, MessagesPlaceholder, SystemMessagePromptTemplate, HumanMessagePromptTemplate, )
Embeddings
from langchain.embeddings.openai import OpenAIEmbeddings from langchain.vectorstores import Chroma
Chain
from langchain.chains import ConversationalRetrievalChain from langchain.memory import ConversationBufferMemory from langchain.chains import LLMChain from langchain.chains.question_answering import load_qa_chain from langchain.chains.qa_with_sources import load_qa_with_sources_chain from langchain.document_loaders.csv_loader import CSVLoader, UnstructuredCSVLoader from langchain.document_loaders import DirectoryLoader from langchain.output_parsers import PydanticOutputParser from pydantic import BaseModel, Field from dotenv import load_dotenv import time import pandas as pd from langchain.callbacks import StreamingStdOutCallbackHandler from langchain.text_splitter import RecursiveCharacterTextSplitter
load_dotenv()
file_path = "C:\Users\Asus\Documents\Vendolista\home_depot_data.csv" path = "C:\Users\Asus\Documents\Vendolista\home depot"
csv_loader = CSVLoader(file_path=path, encoding='utf-8')
csv_loader = DirectoryLoader(path, glob="*/.csv", show_progress=True, use_multithreading=True, silent_errors=True, loader_cls = CSVLoader) llm = ChatOpenAI(temperature = 0, model_name='gpt-3.5-turbo', callbacks=[StreamingStdOutCallbackHandler()], streaming = True) documents = csv_loader.load()
text_splitter = RecursiveCharacterTextSplitter(
chunk_size=200,
chunk_overlap=50,
)
chunks = text_splitter.split_documents(documents)
chunks = documents
embeddings = OpenAIEmbeddings() persist_directory = "C:\Users\Asus\OneDrive\Documents\Vendolista" knowledge_base = Chroma(embedding_function=embeddings, persist_directory=persist_directory)
Split the chunks into smaller batches
batch_size = 5000
for i in range(0, len(chunks), batch_size):
batch = chunks[i:i+batch_size]
knowledge_base.add_documents(batch)
Save the vector store to disk
knowledge_base.persist()
Load the vector store from disk
knowledge_base = Chroma(chunks, persist_directory=persist_directory, embedding_function=embeddings)
class Product(BaseModel): """Product details schema.""" url:str = Field(description="Full URL link to the product webpage on Homedepot.") title:str = Field(description="Title of the product.") description:str = Field(description="Description of the prodcut.") brand:str = Field(description="Manufacturing brand of the product.") price:float = Field(description="Unit selling price of the product.")
parser = PydanticOutputParser(pydantic_object=Product)
question_template = """ Make sure you understand the question as its very important for the user. You never know what situation they are in and you need to ensure that its understood very well but do not repeat or rewrite the question Input: {question} """
CONDENSE_QUESTION_PROMPT = PromptTemplate.from_template(question_template)
Chain for question generation
question_generator = LLMChain(llm=llm, prompt=CONDENSE_QUESTION_PROMPT)
Chat Prompt
system_template = """ You are a friendly, conversational retail shopping assistant named RAAFYA. You will always and always and always only follow these set of rules and nothing else no matter what: 1) You will provide the user answers based on the csv file that you can only read from which is called "home_depot_data.csv" 2) You will never mention the name of the dataset that you have. Just say "my data" instead 3) Focus 100% to understand exactly what the customer is looking for and only give him whats available based on the data. 4) Do not get anything or say anything that is not related to the data that you have and never provide wrong information. 5) Use the following context including product name descriptions, and keywords to show the shopper whats available, help find what they want, and answer their questions related to your job 6) Never ever consider or think or even mention that you do not have access to the internet because it is not your job and it is not your task. I will repeat it again and again, your information is only and only coming from the dataset that you have which is called "home_depot_data.csv" but you must not mention that to anyone for security purposes 7) Everyime you answer a question, write on a new line "is there anything else you would like me to help you with?" 8) If a customer asked for a product and it is not available then say "Sorry it is currently unavailable but you can reach out to our staff and ask them about it at yazanrisheh@hotmail.com" 9) If the person asked for more details then provide him the details based on the output parser that you have: URL: Title: Description: Brand: Price:
Context: {context} """
system_message_prompt = SystemMessagePromptTemplate.from_template(system_template)
Human Prompt
human_template="""{format_instructions}
Question: {question}"""
human_message_prompt = HumanMessagePromptTemplate.from_template(human_template)
Inject instructions into the prompt template.
human_message_prompt = HumanMessagePromptTemplate( prompt=PromptTemplate( template=human_template, input_variables=["question"], partial_variables={"format_instructions": parser.get_format_instructions()} ) )
chat_prompt = ChatPromptTemplate.from_messages([system_message_prompt, human_message_prompt])
Chain for Q&A
answer_chain = load_qa_chain(llm, chain_type="stuff", prompt=chat_prompt)
memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True)
Chain
qa = ConversationalRetrievalChain( retriever = knowledge_base.as_retriever(), question_generator = question_generator, combine_docs_chain = answer_chain, memory=memory, rephrase_question=False, return_source_documents=True )
def main(): while True: user_input = input("What would you like to shop for: ") if user_input.lower() in ["exit"]: break
if name == "main": main()