vitrivr / vitrivr-engine

vitrivr's next-generation retrieval engine. It is capable of extracting and retrieving a wider range of multimedia objects such as audio, video, images or 3d models.
https://vitrivr.org
MIT License
6 stars 3 forks source link

Add support for dense retrieval with instructions #76

Closed faberf closed 2 months ago

faberf commented 5 months ago

Some newer embedding models such as https://huggingface.co/intfloat/e5-mistral-7b-instruct require a one-sentence instruction that describes the retrieval task in addition to the content that should be embedded. This model in particular is currently implemented in the FES and accessible through the ApiWrapper. It might be useful to extend the DenseEmbedding analyser to support these models as well. To accomplish this the DenseEmbedding analyser (or its methods) needs the task instruction (e.g. 'Given a web search query, retrieve relevant passages that answer the query') as a parameter. I would like to ask for your feedback if this should be configured as part of a query or as part of the field. My intuition is that it is a parameter of the query and thus should be passed as query context to newRetrieverForContent... Any thoughts?

ppanopticon commented 5 months ago

When going through the open issue, we were not quite sure what this is about. So maybe you can bring us up to speed during our next weekly.

faberf commented 3 months ago

For housekeeping, this is already implemented in feature branch #83.