tmc / langchaingo

LangChain for Go, the easiest way to write LLM-based programs in Go
https://tmc.github.io/langchaingo/
MIT License
5k stars 643 forks source link

vectorstores: add mongovector #1005

Closed prestonvasquez closed 2 months ago

prestonvasquez commented 2 months ago

resolves https://github.com/tmc/langchaingo/issues/700 GODRIVER-3305

The goal of this PR is to provide a way for users to read and write to an Atlas cluster as a vector database using the MongoDB Go Driver.

A Store should be a wrapper for mongo.Collection, since adding and searching vectors is collection-specific. In this case AddDocuments() and SimilaritySearch() become analogues for Collection.InsertMany() and Collection.Aggregate(). This also keeps our implementation inline with the Python driver implementation.

Because a single collection can have multiple vector search indexes, we must provide an option to use a specific embedding when adding a document. However, to ensure there is at least a default behavior [where a user does not have to include an optional embedder] we should require that a default embedder be included at construction.

We've chosen to use vectorstores.Options.NameSpace to allow users to change the index on a per-operation basis.

This PR suggests mocking the embedding model to the specifications outlined in this blog post: https://dev.to/prestonvasquez/mocking-an-llm-embedder-targeting-mongodb-atlas-1glp

Integration testing will require setting up a free tier Atlas Cluster. The first time the tests run, the vector search indexes will be created. This takes a few minutes. Subsequent tests to the same cluster will not have this requirement.

PR Checklist

prestonvasquez commented 2 months ago

CC: @jtazin @matthewdale @qingyang-hu @blink1073 @joyjwang

prestonvasquez commented 2 months ago

@tmc @eliben This is ready for review.

tmc commented 2 months ago

Sorry about the delay here, I'll very likely be able to review this today and have been meaning to cut a new release so this should make it in soon!

tmc commented 2 months ago

Fantastic change! I'm curious what the plan around the go.mongodb.org/mongo-driver/v2 beta tag is -- I don't love depending on a not-declared-stable package version.

tmc commented 2 months ago

we should also get 'go test' to just work via a testcontainer container

tmc commented 2 months ago

we should also get 'go test' to just work via a testcontainer container

Oh I didn't realize this was leveraging a feature not in the F/OSS version of mongo -- is there a reasonable way to fake this so we don't take on a network dependency in CI?

I see now that you gave a good amount of that context above. I'm happy getting this in with these tests skipped and we chat about how to improve that.