An example of working with embeddings and vector databases in Convex.
Embeddings enable all sorts of use cases, but it's hard to know how they'll perform on comparisons and queries without playing around with them.
This project allows you to add source data, generate embeddings via OpenAI, compare them to each other, and compare semantic and word searches over them.
You can then use the queried source data to include in a ChatGPT prompt (WIP).
UI:
Backend:
Work planned:
A Convex backend: it will be configured automatically on npm run dev
.
By running this first, you can enter environment variables for (2) and (3) on
the dashboard.
An OpenAI API key.
Environment variable: OPEN_API_KEY
(should start with sk-
).
Run npx convex env set OPEN_API_KEY sk-XXXX # --prod
npm install
npm run dev
You can add a source from a URL using the scripts/addURL.py python script:
pip install dotenv convex langchain
python scripts/addURL.py https://example.com
You can add .txt, .md, and .pdf files as sources to your project via:
export VITE_CONVEX_URL= # your backend url - see .env.local (dev) or .env (prod)
npx ts-node-esm scripts/addFiles.ts ./path/to/folder
By default it'll check in a documents folder at the root of the repo. It will upload in chunks