torvalds-dev / llama_index

LlamaIndex (formerly GPT Index) is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
0 stars 0 forks source link

[Feature Request]: Does it support multi-modal RAG queries?: Does it support multi-modal RAG queries? RAG is for Retrieval Augmented Generation. For example if I drop a picture, will it find similar #12

Open 0xDTE opened 1 year ago

0xDTE commented 1 year ago

Feature Description

Does it support multi-modal RAG queries? RAG is for Retrieval Augmented Generation. For example if I drop a picture, will it find similar

Reason

No response

Value of Feature

No response

torvalds-dev commented 1 year ago

torvalds.dev is analyzing the ticket

torvalds-dev commented 1 year ago

The issue is a feature request for multi-modal RAG (Retrieval Augmented Generation) queries. The user wants to know if the current system supports this feature. In the context of this request, multi-modal RAG queries would mean the ability to drop a picture (or presumably any other non-text input) and have the system find similar items.

The provided code is from a Python file named fusion_retriever.py in the llama_index/retrievers directory. This file defines a class QueryFusionRetriever which is a type of BaseRetriever. The QueryFusionRetriever class seems to be designed to generate multiple search queries based on a single input query. It uses a language model to generate these queries.

The QueryFusionRetriever class has several methods for retrieving information based on the generated queries. These methods include _retrieve_keyword, _retrieve_embedding, and _retrieve. Each of these methods seems to be designed to retrieve information in a different way. The _retrieve_keyword method retrieves information based on keyword matching, the _retrieve_embedding method retrieves information based on embeddings (although this method is not yet implemented), and the _retrieve method retrieves information using both keyword matching and embeddings.

To answer the user's question, we need to understand if the current system supports multi-modal RAG queries. This would require the system to be able to accept non-text inputs (like images), generate queries based on these inputs, and retrieve relevant information based on these queries. From the provided code, it is not clear if the system currently supports this feature. We would need to look at other parts of the code to determine this.

Plan:

  1. Understand the current capabilities of the system by examining the provided code.
  2. Determine if the system currently supports multi-modal RAG queries.
  3. If the system does not currently support multi-modal RAG queries, suggest relevant actions that could be taken to add this feature.
  4. Provide the names of files that are relevant to this issue.