Open 0xDTE opened 1 year ago
torvalds.dev is analyzing the ticket
The issue is a feature request for multi-modal RAG (Retrieval Augmented Generation) queries. The user wants to know if the current system supports this feature. In the context of this request, multi-modal RAG queries would mean the ability to drop a picture (or presumably any other non-text input) and have the system find similar items.
The provided code is from a Python file named fusion_retriever.py
in the llama_index/retrievers
directory. This file defines a class QueryFusionRetriever
which is a type of BaseRetriever
. The QueryFusionRetriever
class seems to be designed to generate multiple search queries based on a single input query. It uses a language model to generate these queries.
The QueryFusionRetriever
class has several methods for retrieving information based on the generated queries. These methods include _retrieve_keyword
, _retrieve_embedding
, and _retrieve
. Each of these methods seems to be designed to retrieve information in a different way. The _retrieve_keyword
method retrieves information based on keyword matching, the _retrieve_embedding
method retrieves information based on embeddings (although this method is not yet implemented), and the _retrieve
method retrieves information using both keyword matching and embeddings.
To answer the user's question, we need to understand if the current system supports multi-modal RAG queries. This would require the system to be able to accept non-text inputs (like images), generate queries based on these inputs, and retrieve relevant information based on these queries. From the provided code, it is not clear if the system currently supports this feature. We would need to look at other parts of the code to determine this.
Plan:
Feature Description
Does it support multi-modal RAG queries? RAG is for Retrieval Augmented Generation. For example if I drop a picture, will it find similar
Reason
No response
Value of Feature
No response