slevin48 / openai

🐱ChatGPT-like Bot🤖 with OpenAI API

https://chat48.streamlit.app/

MIT License

2 stars 10 forks source link

Question Answering on documents #20

Open slevin48 opened 1 year ago

slevin48 commented 1 year ago

Like www.chatpdf.com

slevin48 commented 1 year ago

https://sophiamyang.medium.com/4-ways-of-question-answering-in-langchain-188c6707cc5a

https://github.com/sophiamyang/tutorials-LangChain/blob/main/LangChain_QA.ipynb

slevin48 commented 1 year ago

Ask my PDF

https://github.com/mobarski/ask-my-pdf https://ask-my-pdf.streamlit.app/

Debug enables to understand the breakdown of the approach in the function index_file

Extract text (pages) from pdf file (with PyPDF): https://github.com/mobarski/ask-my-pdf/blob/main/src/pdf.py

import pypdf

def pdf_to_pages(file):
    "extract text (pages) from pdf file"
    pages = []
    pdf = pypdf.PdfReader(file)
    for p in range(len(pdf.pages)):
        page = pdf.pages[p]
        text = page.extract_text()
        pages += [text]
    return pages

slevin48 commented 1 year ago

save doc and index on S3 https://github.com/emptycrown/llama-hub/tree/main/loader_hub/s3

from llama_index import download_loader
S3Reader = download_loader("S3Reader")
loader = S3Reader(bucket='scrabble-dictionary', key='dictionary.txt', aws_access_id='[ACCESS_KEY_ID]', aws_access_secret='[ACCESS_KEY_SECRET]')
documents = loader.load_data()

Or manually:

import boto3
s3_client = boto3.client('s3',aws_access_key_id = st.secrets["aws"]["aws_access_key_id"],
                    aws_secret_access_key = st.secrets["aws"]["aws_secret_access_key"])
s3_client.download_file(s3_bucket, object_name,file_name)

slevin48 commented 1 year ago

Change QA example to https://www.impromptubook.com/

slevin48 commented 1 year ago

Ask questions about Teams meetings:

Screenshot 2023-04-29 6 49 54 PM