wagtail / wagtail-vector-index

Store Wagtail pages & Django models as embeddings in vector databases
https://wagtail-vector-index.readthedocs.io/en/latest/
MIT License
15 stars 10 forks source link

Add pgvector index for embeddings #33

Closed tm-kn closed 6 months ago

tomusher commented 6 months ago

Just checking what you mean by this issue @tm-kn ?

tm-kn commented 6 months ago

I don't remember anymore. My bad. This is probably something to do with what @mgax was raising feedback for.

mgax commented 6 months ago

It was about adding a database index to make embedding search faster. It turns out this isn't trivial because the database column doesn't specify a size for the embeddings vector (or something along those lines).

For a bit of context, we were seeing slow (3-4 seconds) search times on Heroku, but it turned out we were fetching all the embeddings into Python and doing the search there. Moving the search in the database reduced query time to ~500ms for ~1300 records (with no index, just doing a table scan) which was good enough.

tomusher commented 6 months ago

Ah thank you @mgax - the word 'index' serving double purposes here doesn't do us any favours.

I've written up this issue separately at #49 so we can keep this one closed