jas3333 / gptchat_pinecone

MIT License
14 stars 1 forks source link

Pinecone Python Script bulk pdf upload #13

Closed revbin closed 1 year ago

revbin commented 1 year ago

Hi

can the python doc_importer.py script have the ability to upload in bulk pdf document from a directory to pinecone.

regards

jas3333 commented 1 year ago

You could try replacing this:

filename = get_selected_file()

if filename:
    print(
        f"Chunking {filename}, this could take several minutes(yes 20+ minutes) depending on size of the PDF.")
    print("Don't close until it's finished.")

    # Split the text into chunks and save to several .txt files
    splitter(filename)
    # Grabs just the text and re-writes to the same file
    format_texts(filename)
    # Store all the file contents into a list
    content = get_file_contents(filename)
    for x in range(len(content)):
        print(f"Injecting #{x}: {content[x]}\n")
        inject(content[x])

With this:

files = os.listdir('/docs')

for file in files:
    print(f"Chunking {file}")
    splitter(file)
    format_texts(file)
    content = get_file_contents(file)
    for x in range(len(content):
        print(f"Injecting #{x}: {content[x]}\n")
        inject(content[x])
revbin commented 1 year ago

work perfectly . thx a lot.