langchain-ai / langchainjs

🦜🔗 Build context-aware reasoning applications 🦜🔗
https://js.langchain.com/docs/
MIT License
12.56k stars 2.15k forks source link

Elasticsearch vector store #771

Closed dkarlovi closed 1 year ago

dkarlovi commented 1 year ago

Please consider porting the vector store support for ES / OS: https://github.com/hwchase17/langchain/blob/master/langchain/vectorstores/elastic_vector_search.py

Sidenote

I was playing with using current Opensearch adapter with ES to prototype this, it seems there's a bit more work than expected. Namely, ES v7 doesn't support the knn query type OS supports, it only allows cosineSimilarity() like implemented in the Python version linked above.

ES v8 does support knn (which they call "Approximate kNN") which is supposedly better performing with a large number of documents in ES, but comes with a caveat the vector dimensions are only up to 1024 (with index: true setting which knn query type requires) , while OpenAI embeddings by text-embedding-ada-002 are by default 1536.

TLDR:

  1. base approach taken in Python version works
  2. with a large number of documents and/or advanced features, the new approach should be taken which will cause some implementation issues and tradeoffs.

Update: with ES 8.8, it now supports 2048 dimensions for vectors, meaning ADA embeddings should fit into dense vectors on which ES knows how to do kNN queries.

dkarlovi commented 1 year ago

Opensearch merged in #792

zhengyuan-ehps commented 1 year ago

Not sure if this related, but MongoDB Atlas search is also hit the 1024 dimensions restriction when using knn vector field, which means the OpenAi embedding is not compatible with mongo vector store neither.

dkarlovi commented 1 year ago

@zhengyuan-ehps IIRC you can set max dimensions on the embeddings model, I'm not sure how much impact this would have on the quality though.

peterkarman1 commented 1 year ago

is this happening? some of us are stuck on es7 >_<

dkarlovi commented 1 year ago

@peterkarman1 you'll not be able to use the knn query, which might not be an issue for you.

Overall, the adapter should be very similar to the existing OpenSearch one, with very minor tweaks, I'm sure PRs are welcome.

dkarlovi commented 1 year ago

Closed in #1810.