pkavumba / django-vectordb

A fast and scalable app that adds vector search capabilities to your Django applications. It offers low latency, fast search results, native Django integration, and automatic syncing between your models and the vector index. Incremental updates are also supported out of the box.
Apache License 2.0
68 stars 6 forks source link

Improvement Request #16

Closed PositivPy closed 9 months ago

PositivPy commented 9 months ago

Can we just return the content_object as the data?

This is an example:

class SearchBookListView(ListView):
    model = get_vectordb_model()
    template_name = 'library/search.html'

    def get_queryset(self):
        q = self.request.GET.get("q")
        return vectordb.search(q, k=10)

I re-use the template accross multiple types of search all using book.title to display the title (as an example). Right now I have to resort to chenanigans like:

{% for book in page_obj %}
            {% if book.content_object %}
                {% include 'library/partials/book_card/horizontal.html' with book=book.content_object %}
            {% else %}
                {% include 'library/partials/book_card/horizontal.html' with book=book %}
            {% endif %}
    {% endfor %}

With vector search, the book data is accessed in book.content_object not book. Returning the metadata right away with the _state as a field would be much better.

pkavumba commented 9 months ago

I understand your frustration. A possible solution is to add an optional argument for the content object type. If this argument is given, the function could return the queryset of those content objects. This is because the API can store any kind of object, including plain text. I think this would make the return type more clear and consistent. Does this sound good to you?

pkavumba commented 9 months ago

I've implemented a solution for this in version 0.2.0. Now, you can easily unwrap all model instances from any vectordb queryset.

Take the example you provided:

class SearchBookListView(ListView):
    model = get_vectordb_model()
    template_name = 'library/search.html'

    def get_queryset(self):
        q = self.request.GET.get("q")
        return vectordb.search(q, k=10)

With the new update, you can modify it like this to directly get the Book instances:

class SearchBookListView(ListView):
    # skip
     ....

    def get_queryset(self):
        q = self.request.GET.get("q")
        return vectordb.search(q, k=10).unwrap()

Using unwrap will return all the model instances matching your query. If your vectordb contains different types of objects, the best approach is to specify a content_type, a feature introduced in version 0.2.0. To ensure you only get Book instances, you can use the content_type argument, which can be a string like Book or <app_label>.Book, or you can directly provide the class itself. For example:

return vectordb.search(q, k=10, content_type=Book).unwrap() # only objects belonging to the Book model will be returned

The above method is recommended for unwrapping objects because it allows you to postpone unwrapping until you are completely done using the queryset. If you prefer the search function to automatically unwrap the results for you, you can do so by setting unwrap=True:

return vectordb.search(q, k=10, content_type=Book, unwrap=True)

I'll be adding official documentation for these features shortly.