wagtail / wagtail-vector-index

Store Wagtail pages & Django models as embeddings in vector databases
https://wagtail-vector-index.readthedocs.io/en/latest/
MIT License
20 stars 13 forks source link

Fix: Default model converter should be able to handle different object types #72

Closed ababic closed 4 months ago

ababic commented 5 months ago

Resolves #67

This PR adds native support for indexes with multiple models, and modifies the EmbeddableFieldsDocumentConverter class to use Django's content types framework to figure out which models to work with.

We can also remove the steps from the docs that instruct developers to make overrides in their custom classes.

tomusher commented 5 months ago

Thanks @ababic this is looking good!

Seems the downside of this approach is that all interactions with the index are going to return instances of your base model.

There's no strict reason why all the models in an index need to inherit from a concrete base model - ideally the Converter should be able to convert a Document back to any model (probably using contenttypes), but doing this in a performant and flexible way is likely more challenging.

Interested to hear your opinions on this - either way I expect we can get this merged as-is and then investigate other options. Might be worth documenting this limitation though.

ababic commented 5 months ago

Thanks @tomusher.

It makes sense to me that converter instances are tied to a single model class only, but I think there would be legs in updating search() and find_similar() to group documents by type, send them off to separate converters for conversion, then reassemble them in the correct order?

As you say, I think it would make sense to tackle separately.

tomusher commented 5 months ago

It makes sense to me that converter instances are tied to a single model class only, but I think there would be legs in updating search() and find_similar() to group documents by type, send them off to separate converters for conversion, then reassemble them in the correct order?

I feel like this means the index needs to know too much about the structure of a particular Document, but can see the appeal. My initial thinking was that converters are not tied to a single model class and can do whatever they need to convert indexed documents back to whatever the application needs to consume.

Could you add some documentation on the 'Indexing across models' section mentioning that query etc. will return instances of the base class?

ababic commented 5 months ago

@tomusher Thinking on this some more... your plan makes more sense! We might as well update EmbeddableFieldsDocumentConverter.

tomusher commented 4 months ago

This looks great to me @ababic - really appreciate the time you've spent on this, it will make the package much more usable, thank you!