wagtail / wagtail-vector-index

Store Wagtail pages & Django models as embeddings in vector databases
https://wagtail-vector-index.readthedocs.io/en/latest/
MIT License
15 stars 10 forks source link

Refactor the registry to hold index instances #52

Closed mgax closed 2 months ago

mgax commented 6 months ago

This patch simplifies the index registry: instead of storing index classes, and creating instances on the fly, it stores index instances that are ready to use. A different approach to addressing https://github.com/wagtail/wagtail-vector-index/issues/18 (though it's orthogonal to https://github.com/wagtail/wagtail-vector-index/pull/51; we could merge both).

The main case for storing index classes seems to be the ability to use multiple indexes for a given model, e.g. for a different embedding or chat backend.

tomusher commented 6 months ago

Thanks, this looks like a good implementation of this change.

I'm still in two minds about this one however;

mgax commented 6 months ago
  • If we are proposing that for a developer to support different behaviours within the same index, they'd need to register multiple instances, would that cause duplication when rebuilding?

Not sure about this one. I'm thinking of a use case where you store embeddings in both pgvector and something else (qdrant, weaviate?). Or generate embeddings using both OpenAI and GPT4All and store them separately to compare performance. But honestly, I don't have a good handle on all the reasons why people would want to parametrise indexes.

tomusher commented 2 months ago

The changes in #65 include changing the registry to hold index instances based on this PR. Thanks for this @mgax !