yazanrisheh commented 10 months ago

Can someone help me fix this code please.

My error: Traceback (most recent call last): File "C:\Users\Asus\Documents\Vendolista\app2.py", line 130, in main() File "C:\Users\Asus\Documents\Vendolista\app2.py", line 99, in main qa = ConversationalRetrievalChain.from_llm( File "C:\Users\Asus\AppData\Local\Programs\Python\Python310\lib\site-packages\langchain\chains\conversational_retrieval\base.py", line 356, in from_llm return cls( File "C:\Users\Asus\AppData\Local\Programs\Python\Python310\lib\site-packages\langchain\load\serializable.py", line 75, in init super().init(kwargs) File "pydantic\main.py", line 341, in pydantic.main.BaseModel.init pydantic.error_wrappers.ValidationError: 1 validation error for ConversationalRetrievalChain retriever value is not a valid dict (type=type_error.dict) PS C:\Users\Asus\Documents\Vendolista> python app2.py 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████| 33/33 [00:02<00:00, 11.17it/s] 87 documents loaded Traceback (most recent call last): File "C:\Users\Asus\Documents\Vendolista\app2.py", line 130, in main() File "C:\Users\Asus\Documents\Vendolista\app2.py", line 99, in main qa = ConversationalRetrievalChain.from_llm( File "C:\Users\Asus\AppData\Local\Programs\Python\Python310\lib\site-packages\langchain\chains\conversational_retrieval\base.py", line 356, in from_llm return cls( File "C:\Users\Asus\AppData\Local\Programs\Python\Python310\lib\site-packages\langchain\load\serializable.py", line 75, in init super().init(kwargs) File "pydantic\main.py", line 341, in pydantic.main.BaseModel.init pydantic.error_wrappers.ValidationError: 1 validation error for ConversationalRetrievalChain retriever value is not a valid dict (type=type_error.dict)

My code:

from dotenv import load_dotenv import csv import PyPDF2 from PyPDF2 import PdfReader from langchain.document_loaders import DirectoryLoader, PyPDFLoader from langchain.text_splitter import RecursiveCharacterTextSplitter from langchain.embeddings.openai import OpenAIEmbeddings from langchain.callbacks import get_openai_callback from langchain.chat_models import ChatOpenAI from langchain.chains import ConversationalRetrievalChain from langchain.memory import ConversationBufferMemory from langchain.prompts import PromptTemplate import time from langchain.vectorstores import Qdrant from langchain.vectorstores import Chroma from langchain.vectorstores import deeplake from langchain.chains.qa_with_sources import load_qa_with_sources_chain from langchain.callbacks import StreamingStdOutCallbackHandler import pandas as pd from docx import Document from nltk.tokenize import sent_tokenize, word_tokenize from collections import Counter from nltk.corpus import stopwords import os

def print_letter_by_letter(text): for char in text: print(char, end='', flush=True) time.sleep(0.02)

def main(): load_dotenv() my_activeloop_org_id = "yazanrisheh" my_activeloop_dataset_name = "langchain_course_customer_support" dataset_path = f"hub://{my_activeloop_org_id}/{my_activeloop_dataset_name}"

directory_path = input("Copy your directory path here or upload a file: ")

directory_path = "C:\\Users\\Asus\\Documents\\Vendolista"

pdf_loader = DirectoryLoader(directory_path,
                              glob="**/*.pdf",
                              show_progress=True,
                              use_multithreading=True,
                              silent_errors=True,
                              loader_cls = PyPDFLoader)

documents = pdf_loader.load()
print(str(len(documents))+ " documents loaded")

llm = ChatOpenAI(temperature = 0, model_name='gpt-3.5-turbo', callbacks=[StreamingStdOutCallbackHandler()], streaming = True)

# Split into chunks
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=800,
    chunk_overlap=100,
)
chunks = text_splitter.split_documents(documents)

embeddings = OpenAIEmbeddings()

persist_directory = "C:\\Users\\Asus\\OneDrive\\Documents\\Vendolista"
knowledge_base = Chroma.from_documents(chunks, embeddings, persist_directory = persist_directory)
# save to disk
knowledge_base.persist() 
#To delete the DB we created at first so that we can be sure that we will load from disk as fresh db
knowledge_base = None
new_knowledge_base = Chroma(persist_directory = persist_directory, embedding_function = embeddings)

# weird_knowledge_base = deeplake(chunks, dataset_path=dataset_path, embedding=embeddings)
# knowledge_base = Qdrant(documents, embeddings)

p_template = """Given the following conversation and a follow up question, rephrase the follow up question to be a standalone question, in its original language.
If the Standalone question is empty or cannot be generated, use the follow up question as Standalone question.
Chat History:
{chat_history}
Follow Up Input: {question}
Standalone question:"""

#CONDENSE_QUESTION_PROMPT = PromptTemplate.from_template(_template)
CONDENSE_QUESTION_PROMPT = PromptTemplate(input_variables=["chat_history","question"],
template=p_template)
memory = ConversationBufferMemory(memory_key="chat_history",input_key="question",output_key='answer',return_messages=True)

chatTemplate = """
Important: You are an intelligent chatbot designed to help agents by answering questions only on Enterprise services & activities.
Answer the question only if there is information in the chat history(delimited by ) and context(delimited by ) below.

If context is not empty and answer cannot be determined from context, say "I cannot detemine the answer from context".
If you don't know the answer, just say that you don't know, don't try to make up an answer.
Do not print your answer starting with "Answer:"
{context} ----------- {chat_history} ----------- Question: {question} Answer: Answer the question only if there is information based on the chat history(delimited by ) and context(delimited by ) below. 1. If context is not empty and answer cannot be determined from context, say "I cannot detemine the answer from context". 2. If you don't know the answer, just say that you don't know, don't try to make up an answer. 3. Do not print your answer starting with "Answer:"
"""

promptHist = PromptTemplate(
input_variables=["context", "question", "chat_history"],
template=chatTemplate
)

qa = ConversationalRetrievalChain.from_llm(
llm = llm,
retriever = new_knowledge_base,
condense_question_prompt=CONDENSE_QUESTION_PROMPT,
verbose=False,
memory=memory,
condense_question_llm=llm,
return_generated_question=True,
combine_docs_chain_kwargs={"prompt": promptHist},
return_source_documents=True,
)

while True:
    question = input("Ask me anything about the files (type 'exit' to quit): ")

    if question.lower() in ["exit"] and len(question) == 4:
        end_chat = "Thank you for visiting us! Have a nice day"
        print_letter_by_letter(end_chat)
        break

    if question:

        # chat_history = []
        # with get_openai_callback() as cb:
        response = qa({"question": question}, return_only_outputs = True)

        # chat_history.append(('user', question))
        # chat_history.append(('AI', response))
        print("AI:", response)

if name == 'main': main()

dosubot[bot] commented 10 months ago

🤖

Based on the error message you provided, it seems that the issue lies in the instantiation of the ConversationalRetrievalChain class. The error message indicates that the retriever parameter is not a valid dictionary.

In your code, you are passing new_knowledge_base as the retriever parameter. However, the retriever parameter is expected to be an instance of BaseRetriever. This is evident from the method signature:

@classmethod
def from_llm(
    cls,
    llm: BaseLanguageModel,
    retriever: BaseRetriever,
    condense_question_prompt: BasePromptTemplate = CONDENSE_QUESTION_PROMPT,
    chain_type: str = "stuff",
    verbose: bool = False,
    condense_question_llm: Optional[BaseLanguageModel] = None,
    combine_docs_chain_kwargs: Optional[Dict] = None,
    callbacks: Callbacks = None,
    **kwargs: Any,
) -> BaseConversationalRetrievalChain:

Here, the 'retriever' parameter is expected to be an instance of 'BaseRetriever'. You can find this in the LangChain codebase.

To resolve this issue, you should convert the Chroma object to a retriever using the as_retriever() method:

new_knowledge_base = Chroma(persist_directory = persist_directory, embedding_function = embeddings).as_retriever()

This should resolve the ValidationError you are seeing. If you still encounter issues, please provide more details about your implementation of the Chroma class and how you are loading the local files.

Sources

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

yazanrisheh commented 10 months ago

I did do that and it did fix however when I am trying to run my code, it's not answering the questions based on my documents that I have from DirectoryLoader. How do I fix that?

dosubot[bot] commented 6 months ago

Hi, @yazanrisheh,

I'm helping the LangChain team manage their backlog and am marking this issue as stale. From what I understand, the issue was initially raised due to a code error in the ConversationalRetrievalChain, which was resolved by my suggestion to convert the Chroma object to a retriever using the as_retriever() method. However, a new issue was raised regarding the code not answering questions based on the documents from DirectoryLoader.

Could you please confirm if this issue is still relevant to the latest version of the LangChain repository? If it is, please let the LangChain team know by commenting on the issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days.

Thank you for your understanding and cooperation.

langchain-ai / langchain

ConversationalRetrievalChain error #11855

directory_path = input("Copy your directory path here or upload a file: ")

Sources