langchain-ai / langchain

šŸ¦œšŸ”— Build context-aware reasoning applications
https://python.langchain.com
MIT License
91.41k stars 14.54k forks source link

Issue: Need Help - Implement ChatOpenAI into my LangChain Research #13406

Closed ZinanYang1995 closed 9 months ago

ZinanYang1995 commented 9 months ago

Issue you'd like to raise.

Hi there,

I am doing a research on creating a PDF reader AI which can answer users' questions based on the PDF uploaded and the prompt user entered. I got it so far with using the OpenAI package but now want's to make it more advance by using ChatOpenAI with the LangChain Schema package (SystemMessage, HumanMessage, and AIMessage). I am kinda lost on where I should start and make the adjustments. Could you help me on that?

Below is my code so far:

Imports

import streamlit as st import os from apikey import apikey import pickle from PyPDF2 import PdfReader from streamlit_extras.add_vertical_space import add_vertical_space from langchain.text_splitter import RecursiveCharacterTextSplitter from langchain.embeddings.openai import OpenAIEmbeddings from langchain.vectorstores import FAISS from langchain.llms import OpenAI from langchain.chains.question_answering import load_qa_chain from langchain.callbacks import get_openai_callback from langchain.chat_models import ChatOpenAI from langchain.schema import (SystemMessage, HumanMessage, AIMessage)

os.environ['OPENAI_API_KEY'] = apikey

User Interface

Side Bar

with st.sidebar: st.title('šŸš€ Zi-GPT Version 2.0') st.markdown('''

About

This app is an LLM-powered chatbot built using:
- [Streamlit](https://streamlit.io/)
- [LangChain](https://python.langchain.com/)
- [OpenAI](https://platform.openai.com/docs/models) LLM model

''')
add_vertical_space(5)
st.write('Made with ā¤ļø by Zi')

Main Page

def main(): st.header("Zi's PDF Helper: Chat with PDF")

# upload a PDF file
pdf = st.file_uploader("Please upload your PDF here", type='pdf')
# st.write(pdf)

# read PDF
if pdf is not None: 
    pdf_reader = PdfReader(pdf)

    # split document into chunks
    # also can use text split: good for PDFs that do not contains charts and visuals
    sections = []
    for page in pdf_reader.pages:
        # Split the page text by paragraphs (assuming two newlines indicate a new paragraph)
        page_sections = page.extract_text().split('\n\n')
        sections.extend(page_sections)

    chunks = sections
    # st.write(chunks)

    # embeddings
    file_name = pdf.name[:-4]
    # comvert into pickle file
    # wb: open in binary mode
    # rb: read the file
    # Note: only create new vectors for new files updated
    if os.path.exists(f"{file_name}.pkl"):
        with open(f"{file_name}.pkl", "rb") as f:
            VectorStore = pickle.load(f)
        st.write('Embeddings Loaded from the Disk')
    else:
        embeddings = OpenAIEmbeddings(model="text-embedding-ada-002")
        VectorStore = FAISS.from_texts(chunks,embedding=embeddings)
        with open(f"{file_name}.pkl", "wb") as f:
            pickle.dump(VectorStore, f)
        st.write('Embeddings Computation Completed')

    # Create chat history
    if pdf:
        # generate chat history
        chat_history_file = f"{pdf.name}_chat_history.pkl"

        # load history if exist
        if os.path.exists(chat_history_file):
            with open(chat_history_file, "rb") as f:
                chat_history  = pickle.load(f)
        else:
            chat_history = []

    # Initialize chat_history in session_state if not present
    if 'chat_history' not in st.session_state:
        st.session_state.chat_history = []

    # Check if 'prompt' is in session state
    if 'last_input' not in st.session_state:
        st.session_state.last_input = ''

    # User Input
    current_prompt = st.session_state.get('user_input', '')

    prompt_placeholder = st.empty()
    prompt = prompt_placeholder.text_area("Ask questions about your PDF:", value=current_prompt, placeholder="Send a message", key="user_input")
    submit_button = st.button("Submit")

    if submit_button and prompt:
        # Update the last input in session state
        st.session_state.last_input = prompt

        docs = VectorStore.similarity_search(query=prompt, k=3)

        #llm = OpenAI(temperature=0.9, model_name='gpt-3.5-turbo')
        chat = ChatOpenAI(model='gpt-3.5-turbo', temperature=0.7)
        chain = load_qa_chain(llm=chat, chain_type="stuff")
        with get_openai_callback() as cb:
            response = chain.run(input_documents=docs, question=prompt)
            print(cb)
        # st.write(response)
        # st.write(docs)

        # Add to chat history
        st.session_state.chat_history.append((prompt, response))

        # Save chat history
        with open(chat_history_file, "wb") as f:
            pickle.dump(st.session_state.chat_history, f)

        # Clear the input after processing
        prompt_placeholder.text_area("Ask questions about your PDF:", value='', placeholder="Send a message", key="pdf_prompt") 

    # Display the entire chat 
    chat_content = ""
    for user_msg, bot_resp in st.session_state.chat_history:
        chat_content += f"<div style='background-color: #222222; color: white; padding: 10px;'>**You:** {user_msg}</div>"
        chat_content += f"<div style='background-color: #333333; color: white; padding: 10px;'>**Zi GPT:** {bot_resp}</div>"
    st.markdown(chat_content, unsafe_allow_html=True)

if name == 'main': main()

Suggestion:

No response

dosubot[bot] commented 9 months ago

šŸ¤–

Hi @ZinanYang1995,

Great to see you continuing your work and making progress on your project! I remember our last interaction and it's awesome to see how far you've come. I'm going to take a deep dive into the code you've shared and will get back to you with some guidance shortly. Hang tight!

dosubot[bot] commented 9 months ago

šŸ¤–

Sure, I can guide you on how to integrate the ChatOpenAI with the LangChain Schema package into your existing project.

Firstly, you need to replace the OpenAI package with the LangChain's OpenAI package and use the LangChain's Schema package to handle the messages. Here's a basic example of how you might do this:

from langchain.chat_models.openai import ChatOpenAI
from langchain.schema.messages import SystemMessage, HumanMessage, AIMessage

# Initialize the OpenAI model
openai = ChatOpenAI(model_name="gpt-3.5-turbo", openai_api_key="your_api_key")

# Create a SystemMessage, HumanMessage, and AIMessage
system_message = SystemMessage(content="System message content")
human_message = HumanMessage(content="Human message content")
ai_message = AIMessage(content="AI message content")

# Use the OpenAI model to generate a response
response = openai._generate([human_message.content])

# Process the response using the AIMessage schema
ai_message.content = response.generations[0].message.content

This is a simplified example and you would need to adapt it to fit the specifics of your PDF reader AI project. For example, you might need to extract text from the PDF and pass it to the OpenAI model, handle multiple messages, or use the streaming functionality of the OpenAI model.

Please note that the LangChain's OpenAI package is designed to work with the LangChain's Schema package, so you should use them together. The LangChain's Schema package provides a structured way to handle messages, which can be useful in a chatbot or similar application.

In the context shared, it's important to note that the OpenAI model initialization has been updated. If you're using a chat model like "gpt-3.5-turbo" or "gpt-4", you should initialize it using from langchain.chat_models.openai import ChatOpenAI instead of from langchain.llms import OpenAIChat. Also, the method to generate a response has been updated to _generate instead of generate. The response from the _generate method is a ChatResult object, and you can get the AI's message content from response.generations[0].message.content.

Additionally, the ChatOpenAI class has several parameters that you can set when initializing the model, such as temperature, max_tokens, and openai_api_key. The openai_api_key is automatically inferred from the environment variable OPENAI_API_KEY if not provided. You can also set the openai_organization parameter, which is automatically inferred from the environment variable OPENAI_ORG_ID if not provided. If you're using a proxy or service emulator, you can set the openai_api_base parameter. The request_timeout parameter sets the timeout for requests to the OpenAI completion API. The max_retries parameter sets the maximum number of retries to make when generating. The streaming parameter determines whether to stream the results or not. The n parameter sets the number of chat completions to generate for each prompt. The tiktoken_model_name parameter sets the model name to pass to tiktoken when using this class. The default_headers and default_query parameters can be used to set default headers and query parameters for the requests. The http_client parameter can be used to configure a custom httpx client.

I hope this helps! If you have any other questions, feel free to ask.

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a šŸ‘ if this is helpful and šŸ‘Ž if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

ZinanYang1995 commented 9 months ago

I am confused by these steps:

Use the OpenAI model to generate a response

response = openai._generate([human_message.content])

Process the response using the AIMessage schema

ai_message.content = response.generations[0].message.content

The reason is that I am trying to pass pieces of chunks into a chain to get the right information and response. How should I implement openai._generate and AIMessage into my chain logic?