Closed mongodben closed 1 year ago
note that MongoDB knnBeta operator supports a max vector length of 1024.
this means that we cannot use the popular embedding API text-embedding-ada-002
from OpenAI b/c it returns vectors of length 1536 (docs).
instead, what works is OpenAI's text-search-ada-doc-001
for the indexed doc and text-search-ada-query-001
for the query. this won't work as well as ada-002, but works w mongodb.
some time in this week, i want to see if we can get in touch w the Atlas Search team to juice our knnBeta to support vectors of length 1536, so we can use ada-002. marcus said this'd be possible, but i haven't pursued further.
also, this is the furthest along aspect of the project pre-skunk b/c i wanted to validate that it'd in fact be possible to use embeddings with Atlas Search and an AI summarizer to do a QA bot. happily, it is 🥳
can be found here: https://github.com/mongodben/mongodb-oracle/tree/main/pre-skunk-poc/generate-index
Script(s) to index data from local machine. Use vector embedding API for this. maybe OpenAI?