langchain-ai / langchain

🦜🔗 Build context-aware reasoning applications
https://python.langchain.com
MIT License
94.31k stars 15.25k forks source link

Regarding explaination of answer which is returned by OpenAI embeddings #5052

Closed nithinreddyyyyyy closed 1 year ago

nithinreddyyyyyy commented 1 year ago

System Info

I'm working on Q&A using OpenAI for pdf and another documents. Below is the code

from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores import Chroma
from langchain import OpenAI, VectorDBQA
import pickle
import textwrap
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.llms import OpenAI
from langchain.chains import RetrievalQA
from langchain.document_loaders import PyPDFLoader, DirectoryLoader
import os
import warnings
warnings.filterwarnings("ignore")

# Set up the environment variable for the OpenAI API key
os.environ["OPENAI_API_KEY"] = ""

def get_documents(folder_path, file_extension):
    documents = []
    if file_extension == 'pdf':
        pdf_loader = DirectoryLoader(folder_path, glob="./*.pdf", loader_cls=PyPDFLoader)  # Select PDF files
        documents += pdf_loader.load()
    elif file_extension == 'txt':
        txt_loader = DirectoryLoader(folder_path, glob="./*.txt")  # Select TXT files
        documents += txt_loader.load()
    elif file_extension == 'combined':
        pdf_loader = DirectoryLoader(folder_path, glob="./*.pdf", loader_cls=PyPDFLoader)  # Select PDF files
        documents += pdf_loader.load()
        txt_loader = DirectoryLoader(folder_path, glob="./*.txt")  # Select TXT files
        documents += txt_loader.load()
    else:
        return None

    return documents

def get_query_result(query, documents):
    # Split documents
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=2000, chunk_overlap=200)
    texts = text_splitter.split_documents(documents)

    # Query documents
    embeddings = OpenAIEmbeddings(openai_api_key=os.environ['OPENAI_API_KEY'])
    docsearch = Chroma.from_documents(texts, embeddings)
    qa = VectorDBQA.from_chain_type(llm=OpenAI(), chain_type="stuff", vectorstore=docsearch, return_source_documents=True)
    result = qa({"query": query})

    result_text = result['result'].strip()
    source = result.get('source_documents', [{}])[0].metadata.get('source', '')
    page = result.get('source_documents', [{}])[0].metadata.get('page', '')

    return result_text, source, page

def chat_loop(file_extension, folder_path):
    documents = get_documents(folder_path, file_extension)
    if documents is None:
        print("Invalid folder path or no supported files found.")
        return

    while True:
        query = input("Enter your query (type 'exit' to end): ")
        if query.lower() == 'exit':
            break

        result = get_query_result(query, documents)

        if result is not None:
            result_text, source, page = result
            print("Result:", result_text)
            if source:
                print("Source:", source)
                print("Page:", page)
        else:
            print("No answer found for the query.")

        print()  # Print an empty line for separation

# Get the selected file extension and folder path from the webpage
selected_file_extension = 'combined' 
folder_path = 'Documents'

# Start the chat loop
chat_loop(selected_file_extension, folder_path)

The code above will just take the input pdf or any other document text and provide a single line answer. In ChatGPT, if we provide it a long text or paragraph and ask it a question, it will give us the answer and explain where it got the answer and why it is correct. Is it possible to perform the same in the above code?

Who can help?

No response

Information

Related Components

Reproduction

Looking for a better explanation of answer instead of returning a single line answer or just answer.

Expected behavior

Expecting to return the answers with better explanation or articulation.

dosubot[bot] commented 1 year ago

Hi, @nithinreddyyyyyy! I'm Dosu, and I'm here to help the LangChain team manage their backlog. I wanted to let you know that we are marking this issue as stale.

From what I understand, you opened this issue to request a more detailed explanation or articulation of the answer returned by the OpenAI embeddings in the provided code. However, there hasn't been any activity or comments on the issue since you opened it.

Before we close this issue, we wanted to check with you if it is still relevant to the latest version of the LangChain repository. If it is, please let us know by commenting on the issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days.

Thank you for your understanding and contribution to the LangChain project! Let us know if you have any further questions or concerns.