Support for the sparse embeddings

langchain-ai / langchain-postgres

LangChain abstractions backed by Postgres Backend

MIT License

133 stars 48 forks source link

Support for the sparse embeddings #71

Open magaton opened 5 months ago

magaton commented 5 months ago

The latest pgvector version supports sparsevec. However, langchain's PGVector supports only one embeddings column in langchain_pg_embedding table. It would be great to have a sparse_embedding column and sparse_embedding field in PGVector.

I have considered the alternative and that is to have 2 PGVector stores, 1 for dense and 1 for sparse vectors. However there are 2 problems with that:

PGVector has hardcoded table names for collection and embeddings
I would like to leverage excellent langchain indexer with SQL manager.

gecBurton commented 5 months ago

hi @magaton I would be interested in collaborating on this, I would also like some kind of full-text/dense feature https://github.com/langchain-ai/langchain-postgres/issues/61

Freezaa9 commented 1 month ago

Hello, would be interested also.

But I think each vector DB should be separated. So for a hybrid search it would be

One Dense embedding vector DB (using the current feature)
One Sparse Vector DB (using https://github.com/pgvector/pgvector-python/blob/master/examples/hybrid_search/cross_encoder.py)

And then rerank by using EnsembleRetriever (for example: https://python.langchain.com/docs/how_to/ensemble_retriever/ )

To achieve this we should also bump the pgvector python version: #82