slevin48 / openai

🐱ChatGPT-like Bot🤖 with OpenAI API
https://chat48.streamlit.app/
MIT License
2 stars 10 forks source link

Question Answering on documents #20

Open slevin48 opened 1 year ago

slevin48 commented 1 year ago

Like www.chatpdf.com

image

slevin48 commented 1 year ago

https://sophiamyang.medium.com/4-ways-of-question-answering-in-langchain-188c6707cc5a

https://github.com/sophiamyang/tutorials-LangChain/blob/main/LangChain_QA.ipynb

slevin48 commented 1 year ago

Ask my PDF

https://github.com/mobarski/ask-my-pdf https://ask-my-pdf.streamlit.app/

Debug enables to understand the breakdown of the approach in the function index_file

image

Extract text (pages) from pdf file (with PyPDF): https://github.com/mobarski/ask-my-pdf/blob/main/src/pdf.py

import pypdf

def pdf_to_pages(file):
    "extract text (pages) from pdf file"
    pages = []
    pdf = pypdf.PdfReader(file)
    for p in range(len(pdf.pages)):
        page = pdf.pages[p]
        text = page.extract_text()
        pages += [text]
    return pages
slevin48 commented 1 year ago

save doc and index on S3 https://github.com/emptycrown/llama-hub/tree/main/loader_hub/s3

from llama_index import download_loader
S3Reader = download_loader("S3Reader")
loader = S3Reader(bucket='scrabble-dictionary', key='dictionary.txt', aws_access_id='[ACCESS_KEY_ID]', aws_access_secret='[ACCESS_KEY_SECRET]')
documents = loader.load_data()

Or manually:

import boto3
s3_client = boto3.client('s3',aws_access_key_id = st.secrets["aws"]["aws_access_key_id"],
                    aws_secret_access_key = st.secrets["aws"]["aws_secret_access_key"])
s3_client.download_file(s3_bucket, object_name,file_name)
slevin48 commented 1 year ago

Change QA example to https://www.impromptubook.com/

slevin48 commented 1 year ago

Ask questions about Teams meetings:

Screenshot 2023-04-29 6 49 54 PM

slevin48 commented 1 year ago

https://github.com/freedmand/semantra

https://user-images.githubusercontent.com/306095/233867821-601db8b0-19c6-4bae-8e93-720b324dc199.mov

slevin48 commented 1 year ago

https://github.com/bhaskatripathi/pdfGPT

slevin48 commented 1 year ago

streamlit-qa_doc-2023-05-26-14-05-12.webm

slevin48 commented 1 year ago

streamlit-qa_doc-2023-05-26-15-05-74.webm

slevin48 commented 1 year ago

https://blog.streamlit.io/langchain-tutorial-4-build-an-ask-the-doc-app/