Closed mgax closed 5 months ago
Thanks, this looks like a good implementation of this change.
I'm still in two minds about this one however;
chat_backend
might be better off as an argument to query
.update_vector_indexes
command. If we are proposing that for a developer to support different behaviours within the same index, they'd need to register multiple instances, would that cause duplication when rebuilding?
- If we are proposing that for a developer to support different behaviours within the same index, they'd need to register multiple instances, would that cause duplication when rebuilding?
Not sure about this one. I'm thinking of a use case where you store embeddings in both pgvector and something else (qdrant, weaviate?). Or generate embeddings using both OpenAI and GPT4All and store them separately to compare performance. But honestly, I don't have a good handle on all the reasons why people would want to parametrise indexes.
The changes in #65 include changing the registry to hold index instances based on this PR. Thanks for this @mgax !
This patch simplifies the index registry: instead of storing index classes, and creating instances on the fly, it stores index instances that are ready to use. A different approach to addressing https://github.com/wagtail/wagtail-vector-index/issues/18 (though it's orthogonal to https://github.com/wagtail/wagtail-vector-index/pull/51; we could merge both).
The main case for storing index classes seems to be the ability to use multiple indexes for a given model, e.g. for a different embedding or chat backend.
./manage.py update_vector_indexes
should rebuild each set of embeddings. Therefore, it makes sense to register multiple indexes for the same model. With this patch, the model'sget_vector_index()
method would return the "default" index, and the user would callregistry.register_index()
with additional indexes.