mongodben / mongodb-oracle

The MongoDB Oracle 🧙‍♀️🔮🌱
https://mongodb-oracle.vercel.app
8 stars 3 forks source link

Script to index site data #4

Closed mongodben closed 1 year ago

mongodben commented 1 year ago

Script(s) to index data from local machine. Use vector embedding API for this. maybe OpenAI?

mongodben commented 1 year ago

note that MongoDB knnBeta operator supports a max vector length of 1024.

this means that we cannot use the popular embedding API text-embedding-ada-002 from OpenAI b/c it returns vectors of length 1536 (docs).

instead, what works is OpenAI's text-search-ada-doc-001 for the indexed doc and text-search-ada-query-001 for the query. this won't work as well as ada-002, but works w mongodb.

some time in this week, i want to see if we can get in touch w the Atlas Search team to juice our knnBeta to support vectors of length 1536, so we can use ada-002. marcus said this'd be possible, but i haven't pursued further.

mongodben commented 1 year ago

also, this is the furthest along aspect of the project pre-skunk b/c i wanted to validate that it'd in fact be possible to use embeddings with Atlas Search and an AI summarizer to do a QA bot. happily, it is 🥳

can be found here: https://github.com/mongodben/mongodb-oracle/tree/main/pre-skunk-poc/generate-index