superduper-io / superduper

Superduper: Integrate AI models and machine learning workflows with your database to implement custom AI applications, without moving your data. Including streaming inference, scalable model hosting, training and vector search.
https://superduper.io
Apache License 2.0
4.7k stars 458 forks source link

[BUG]: While executing a search using SuperDuperDB getting the following error in Local MongoDB #1803

Closed valivetirama closed 8 months ago

valivetirama commented 8 months ago

Contact Details [Optional]

No response

System Information

Mongodb : mongodb://localhost:27017

database: leads collection: lead_info

mongo compass version : Version 1.42.1 (1.42.1) MongoDB Version : 7.0.5

[dependencies] pymongo version = "^4.6.1" superduperdb version = "^0.1.1" python version = "^3.10.0" openai = 1.1.2

What happened?

I am trying to replicate the same example using my local mongoDb https://docs.superduperdb.com/docs/use_cases/question-answering/Chatbot

But when I am trying to execute the search using SuperDuperDB to find documents containing the specified query, getting the following error.

from pymongo import MongoClient
from superduperdb import superduper
from superduperdb.backends.mongodb import Collection
from superduperdb import Document
from superduperdb.ext.openai import OpenAIEmbedding
from superduperdb import Listener
from superduperdb import VectorIndex 

listener = Listener(
    model=model,          
    key='name',           
    select=collection.find()  
)

_ = db.add(
    VectorIndex(
        identifier='my-index',        
        indexing_listener=listener    
    )
)

# Execute a search using SuperDuperDB to find documents containing the specified query
result = db.execute(
    collection
        .like(Document({'name': "provide me flat with small garden"}), vector_index='my-index')
        .find()
)

and I get this error at result = db.execute

Traceback (most recent call last):
  File "G:\Play\check.py", line 97, in <module>
    result = db.execute(
  File "G:\Play\penv\lib\site-packages\superduperdb\base\datalayer.py", line 401, in execute
    return self.select(query, *args, **kwargs)
  File "G:\Play\penv\lib\site-packages\superduperdb\base\datalayer.py", line 463, in select
    return select.execute(self, load_hybrid=load_hybrid)
  File "G:\Play\penv\lib\site-packages\superduperdb\backends\mongodb\query.py", line 371, in execute
    output, scores = self._execute(db)
  File "G:\Play\penv\lib\site-packages\superduperdb\backends\mongodb\query.py", line 351, in _execute
    similar_ids, similar_scores = self.pre_like.execute(db)
  File "G:\Play\penv\lib\site-packages\superduperdb\backends\base\query.py", line 547, in execute
    return db.select_nearest(
  File "G:\Play\penv\lib\site-packages\superduperdb\base\datalayer.py", line 1245, in select_nearest
    return vi.get_nearest(like, db=self, ids=ids, n=n, outputs=outs)
  File "G:\Play\penv\lib\site-packages\superduperdb\components\vector_index.py", line 137, in get_nearest
    return db.fast_vector_searchers[self.identifier].find_nearest_from_array(
  File "G:\Play\penv\lib\site-packages\superduperdb\base\datalayer.py", line 1289, in __missing__
    value = self[key] = self.callable(key)
  File "G:\Play\penv\lib\site-packages\superduperdb\base\datalayer.py", line 140, in initialize_vector_searcher
    vector_search_cls = vector_searcher_implementations[searcher_type]
KeyError: 'mongodb'

Expecting the result

Steps to reproduce

  1. Create a MongoDb with Database with Collection
  2. Load the Dataset
  3. Create a Vector-Search Index
  4. Try to get the Result

Relevant log output ...

Relevant log output

Traceback (most recent call last):
  File "G:\Play\check.py", line 97, in <module>
    result = db.execute(
  File "G:\Play\penv\lib\site-packages\superduperdb\base\datalayer.py", line 401, in execute
    return self.select(query, *args, **kwargs)
  File "G:\Play\penv\lib\site-packages\superduperdb\base\datalayer.py", line 463, in select
    return select.execute(self, load_hybrid=load_hybrid)
  File "G:\Play\penv\lib\site-packages\superduperdb\backends\mongodb\query.py", line 371, in execute
    output, scores = self._execute(db)
  File "G:\Play\penv\lib\site-packages\superduperdb\backends\mongodb\query.py", line 351, in _execute
    similar_ids, similar_scores = self.pre_like.execute(db)
  File "G:\Play\penv\lib\site-packages\superduperdb\backends\base\query.py", line 547, in execute
    return db.select_nearest(
  File "G:\Play\penv\lib\site-packages\superduperdb\base\datalayer.py", line 1245, in select_nearest
    return vi.get_nearest(like, db=self, ids=ids, n=n, outputs=outs)
  File "G:\Play\penv\lib\site-packages\superduperdb\components\vector_index.py", line 137, in get_nearest
    return db.fast_vector_searchers[self.identifier].find_nearest_from_array(
  File "G:\Play\penv\lib\site-packages\superduperdb\base\datalayer.py", line 1289, in __missing__
    value = self[key] = self.callable(key)
  File "G:\Play\penv\lib\site-packages\superduperdb\base\datalayer.py", line 140, in initialize_vector_searcher
    vector_search_cls = vector_searcher_implementations[searcher_type]
KeyError: 'mongodb'
anitaokoh commented 8 months ago

Hey @valivetirama ,

Thank you for creating an issue

I tried reproducing the error with your version specifications and using MongoClient . I am unable to reproduce the error.

Could you check your data or your configuration? It could be either

Here is a colabs link notebook (with your version specifications )I just ran to recreate it .

This way you would be able to isolate where the issue is

Let me know how it goes

valivetirama commented 8 months ago

Hi @anitaokoh ,

The issue was with the below snippet,

# SuperDuperDB, now handles your MongoDB database
# It just super dupers your database
db = superduper(
    mongodb_uri,
    **cluster__vector_search=mongodb_uri,**
)

If we can remove the cluster__vector_search=mongodb_uri from the above snippet it is working fine and able to generate the Result below

``` This works for other components, such as `VectorIndex`. `VectorIndex` instances also contain instances of: - `Listener` - `Model` When one adds the `VectorIndex` with `db.add(vector_index)`, the sub-components are also versioned, if a version has not already been assigned to those components in the same session. Read more about `VectorIndex` and vector-searches [here](../walkthrough/vector_search.md). Read more about `VectorIndex` and vector-searches [here](../walkthrough/vector_search.md). For instance, creating a `VectorIndex` involves also creating a `Listener` and a `Model` inline. ```python db.add( VectorIndex( 'my-index' indexing_listener=Listener( model=model, key='txt', select=my_collection.find(), ), ) ) ``` Read more about the `VectorIndex` concept [here](../walkthrough/vector_search.md). ), ) ) ``` Read more about the `VectorIndex` concept [here](../walkthrough/vector_search.md). ``` Read more about the `VectorIndex` concept [here](../walkthrough/vector_search.md). **Really Appreciate for your time and help for resolving the issue Thanks**