langchain-ai / langchain-postgres

LangChain abstractions backed by Postgres Backend
MIT License
130 stars 48 forks source link

How to query CollectionStore and EmbeddingStore models directly in a clean way? #88

Open darahayes opened 4 months ago

darahayes commented 4 months ago

Hello, thanks for this great project I've found it very useful. I have a use case right now where within one application I want to create and manage multiple collections as well as being able to fetch and return details about some collections, e.g. the name and the collection metadata - Essentially my use case is CRUD for collections.

Currently I don't really see any way to do that cleanly other than dropping down to raw SQL queries in my application. Would this be the recommended approach?

I see in the source code in vectorstores.py that there are "private"/unexposed SQLAlchemy models defined for CollectionStore and EmbeddingsStore. Having them exposed would make querying against the tables a lot easier, at least for my particular use case.

I can understand why you might want to keep them private - they might be subject to change and any user code that touches those models potentially breaks. But I think even when the models are not exposed, if there were changes that resulted in the database tables being different, this would still be a breaking change for a lot of apps anyways.

Is exposing those models something you might consider? Or would you recommend going with raw SQL? Would be more than happy to submit a PR. Thanks!

eyurtsev commented 4 months ago

Hi @darahayes, there's no current way to do this.

This code needs to be refactored to support two things:

  1. Add a control plane (IndexAdmin) that will do exactly what you need it to do.
  2. Create different tables for the actual embeddings (e.g., to support different embedding dimensions)

Here's a stub at the abstraction that's needed: https://github.com/langchain-ai/langchain/pull/23990/files

This would also open up the pathway for being able to apply specific types of indices on the collections and do schema migration down the roads if necessary.

If you're interested in helping out, I can help provide some guidance if needed!

Sachin-Bhat commented 4 months ago

Hey @eyurtsev,

If more information is given I can take this up.

Cheers, Sachin