ucl98 / pinecone_ingest_python_implementation

18 stars 3 forks source link

My specs

Bonus

With pdf files ingested using this repository, you can add links to your sources that bring you to the pdf + page it is from.

Setup

  1. Forewords: This project is not part of gpt4-pdf-chatbot-langchain. Both projects should not be combined. However, when you use this project to ingest your files, you will still see them in the chatbot of gpt4-pdf-chatbot-langchain IF the pinecone index and the namespaces in both projects are the same.

    image

  2. Installation:

    • Option 1: pip install: Type pip install -r requirements.txt in the project's terminal.

    • Option 2: If that is not working, install the dependencies one by one: Install the following depencencies at the projects terminal pip install langchain pip install pinecone-client pip install pypdf

  3. Modify the following code block in "config.py".

    OPENAI_API_KEY = "your_OPENAI_API_KEY"
    PINECONE_API_KEY = "your_PINECONE_API_KEY"
    PINECONE_ENVIRONMENT = "your_PINECONE_ENVIRONMENT"
    PINECONE_INDEX_NAME = "your_PINECONE_INDEX_NAME"
    PINECONE_NAMESPACE = "pdfs"
  4. Create a new folder "docs" inside the project. Add the pdfs you want to delete or upload to pinecone in "docs". Remove them after you ingested or delete them.

Upload to pinecone

python ingest.py

Delete from pinecone

python delete.py

Was the ingest successful?

To check wether or not the ingest process was successful, go to pinecone and see if vectors where added in the namespace you defined.

image