wagtail / wagtail-vector-index

Store Wagtail pages & Django models as embeddings in vector databases
https://wagtail-vector-index.readthedocs.io/en/latest/
MIT License
15 stars 12 forks source link

Support automatically updating index when content is published #6

Open tomusher opened 11 months ago

tomusher commented 11 months ago

At the moment the only way to update the indexes is to run the update_vector_indexes management command. This gives developers control over when embedding API costs are incurred.

It would be nice if there was an option to add/update items in the index when they are published/saved.

Some considerations:

tm-kn commented 10 months ago

Should we have a way to run this on Celery given that this is an external API call and advise not to use it in the request/response cycle?

Morsey187 commented 9 months ago

I've opened up a draft PR here https://github.com/wagtail/wagtail-vector-index/pull/30, however theres a few points that need resolving before this can be completed.

Issue: Signals are part of the request cycle and updating indexes can be time consuming, we should add support for a task queue and consider whether we'd want to allow using these signals without one at all. Issue: Currently requires rebuilding the whole index, instead of updating, we'd need to figure out: Which indexes a model is in (so we can update the right indexes) A way to remove documents from an index that match a given set of metadata (the object id and content type ID in this case) An easier way to generate embeddings on a per-document level, instead of at the rebuild index stage

tm-kn commented 7 months ago

For reference, @mgax implemented this as part of his project, but we'd likely have the same issues as above. I don't think @mgax's solution rebuilds the whole index.

Maybe we need to make it a bit less automatic or perfect so we can merge this in some form?