langchain-ai / langchain

🦜🔗 Build context-aware reasoning applications
https://python.langchain.com
MIT License
94.51k stars 15.29k forks source link

No HNSW index in pgvector vector store #23853

Closed holasoftware closed 1 week ago

holasoftware commented 4 months ago

Checked other resources

Example Code

Not applicable

Error Message and Stack Trace (if applicable)

No response

Description

There is no HNSW index in the pgvector vector store:

https://github.com/langchain-ai/langchain/blob/master/libs/community/langchain_community/vectorstores/pgvector.py

Unlike the pgembedding vectore store:

https://github.com/langchain-ai/langchain/blob/master/libs/community/langchain_community/vectorstores/pgembedding.py#L192

System Info

Not applicable

wulifu2hao commented 4 months ago

Hi @holasoftware , I wonder why you need the hnsw index functionality to be available in pgvector.py if it is already in pgembedding.py ?

wulifu2hao commented 4 months ago

PR: https://github.com/langchain-ai/langchain-postgres/pull/85

jackbravo commented 3 months ago

Hi @holasoftware , I wonder why you need the hnsw index functionality to be available in pgvector.py if it is already in pgembedding.py ?

pgembedding is not for pgvector, but for pg_embedding. They are different postgres extensions. And actually there is also pgvecto_rs (docs), meant for the third pg extension pgvecto.rs :-p.

So this PR is for pgvector, which is I think more widely used, since that extension is supported by AWS and GCP, for example.

holasoftware commented 3 months ago

pg_embedding and pgvector are 2 different postgresSQL extensions. Without the HNSW index in the embedding field, the pgvector vector store is not making an efficient search. I suppose that it's making an exhaustive search.