Closed prestonvasquez closed 2 months ago
CC: @jtazin @matthewdale @qingyang-hu @blink1073 @joyjwang
@tmc @eliben This is ready for review.
Sorry about the delay here, I'll very likely be able to review this today and have been meaning to cut a new release so this should make it in soon!
Fantastic change! I'm curious what the plan around the go.mongodb.org/mongo-driver/v2 beta tag is -- I don't love depending on a not-declared-stable package version.
we should also get 'go test' to just work via a testcontainer container
we should also get 'go test' to just work via a testcontainer container
Oh I didn't realize this was leveraging a feature not in the F/OSS version of mongo -- is there a reasonable way to fake this so we don't take on a network dependency in CI?
I see now that you gave a good amount of that context above. I'm happy getting this in with these tests skipped and we chat about how to improve that.
resolves https://github.com/tmc/langchaingo/issues/700 GODRIVER-3305
The goal of this PR is to provide a way for users to read and write to an Atlas cluster as a vector database using the MongoDB Go Driver.
A Store should be a wrapper for mongo.Collection, since adding and searching vectors is collection-specific. In this case AddDocuments() and SimilaritySearch() become analogues for Collection.InsertMany() and Collection.Aggregate(). This also keeps our implementation inline with the Python driver implementation.
Because a single collection can have multiple vector search indexes, we must provide an option to use a specific embedding when adding a document. However, to ensure there is at least a default behavior [where a user does not have to include an optional embedder] we should require that a default embedder be included at construction.
We've chosen to use
vectorstores.Options.NameSpace
to allow users to change the index on a per-operation basis.This PR suggests mocking the embedding model to the specifications outlined in this blog post: https://dev.to/prestonvasquez/mocking-an-llm-embedder-targeting-mongodb-atlas-1glp
Integration testing will require setting up a free tier Atlas Cluster. The first time the tests run, the vector search indexes will be created. This takes a few minutes. Subsequent tests to the same cluster will not have this requirement.
PR Checklist
memory: add interfaces for X, Y
orutil: add whizzbang helpers
).Fixes #123
).golangci-lint
checks.