ianmacartney / embeddings-in-convex

An example of working with embeddings in Convex.
18 stars 2 forks source link

Embeddings Playground with Pinecone, OpenAI, and Convex

An example of working with embeddings and vector databases in Convex.

Embeddings enable all sorts of use cases, but it's hard to know how they'll perform on comparisons and queries without playing around with them.

This project allows you to add source data, generate embeddings via OpenAI, compare them to each other, and compare semantic and word searches over them.

You can then use the queried source data to include in a ChatGPT prompt (WIP).

UI:

Backend:

Work planned:

Setup

Prerequisites:

  1. A Convex backend: it will be configured automatically on npm run dev. By running this first, you can enter environment variables for (2) and (3) on the dashboard.

  2. An OpenAI API key. Environment variable: OPEN_API_KEY (should start with sk-). Run npx convex env set OPEN_API_KEY sk-XXXX # --prod

Run:

npm install
npm run dev

Upload sources from a URL

You can add a source from a URL using the scripts/addURL.py python script:

pip install dotenv convex langchain
python scripts/addURL.py https://example.com

Upload sources from a folder

You can add .txt, .md, and .pdf files as sources to your project via:

export VITE_CONVEX_URL= # your backend url - see .env.local (dev) or .env (prod)
npx ts-node-esm scripts/addFiles.ts ./path/to/folder

By default it'll check in a documents folder at the root of the repo. It will upload in chunks