wagtail / wagtail-vector-index

Store Wagtail pages & Django models as embeddings in vector databases
https://wagtail-vector-index.readthedocs.io/en/latest/
MIT License
15 stars 10 forks source link

Support indexes on pgvector backend #49

Open tomusher opened 6 months ago

tomusher commented 6 months ago

The PgvectorEmbedding model doesn't currently include pgvector indexes. Adding these would speed up retrieval for large datasets.

Adding an index is relatively straightforward with the pgvector library, but the problem is that we need to specify the number of dimensions on the VectorField.

We currently leave this undefined because the user could specify multiple embedding backends with different output dimensions, which we'd be keeping in the same table.

In order to specify dimensions and therefore use indexes, we'd need to:

Leaning towards only supporting one embedding model - I expect this is going to be OK especially with the increasing prevalence of multimodal embedding models through which we could embed images, audio, pages, etc. with the same model.

brylie commented 1 month ago

Similar to Wagtail and Django, it would probably be good to let users to find their own indexes. That way they can choose the constraints and embedding model for example they might not want to use OpenAI or might use variations of OpenAI models that have various embedding sizes. Embedding choice should be a project decision. Instead of using a default index size it might be good to provide documentation and helpers for creating indexes.