Open tomusher opened 11 months ago
Should we have a way to run this on Celery given that this is an external API call and advise not to use it in the request/response cycle?
I've opened up a draft PR here https://github.com/wagtail/wagtail-vector-index/pull/30, however theres a few points that need resolving before this can be completed.
Issue: Signals are part of the request cycle and updating indexes can be time consuming, we should add support for a task queue and consider whether we'd want to allow using these signals without one at all. Issue: Currently requires rebuilding the whole index, instead of updating, we'd need to figure out: Which indexes a model is in (so we can update the right indexes) A way to remove documents from an index that match a given set of metadata (the object id and content type ID in this case) An easier way to generate embeddings on a per-document level, instead of at the rebuild index stage
For reference, @mgax implemented this as part of his project, but we'd likely have the same issues as above. I don't think @mgax's solution rebuilds the whole index.
Maybe we need to make it a bit less automatic or perfect so we can merge this in some form?
At the moment the only way to update the indexes is to run the
update_vector_indexes
management command. This gives developers control over when embedding API costs are incurred.It would be nice if there was an option to add/update items in the index when they are published/saved.
Some considerations: