torvalds-dev / llama_index

LlamaIndex (formerly GPT Index) is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
0 stars 0 forks source link

There are a lot of tables and text in my data. First of all, I have tried Fusion_RAG, which is much better than the baseline, but it is limited to the text, and the table can not be processed, so I wondered if there a way could combine Semi_Structured_RAG and Fusion_RAG so that I could deal with both text and table at the same time. #13

Closed Raghavan1988 closed 7 months ago

Raghavan1988 commented 7 months ago

Feature Description

There are a lot of tables and text in my data. First of all, I have tried Fusion_RAG, which is much better than the baseline, but it is limited to the text, and the table can not be processed, so I wondered if there a way could combine Semi_Structured_RAG and Fusion_RAG so that I could deal with both text and table at the same time.

Reason

No response

Value of Feature

No response

torvalds-dev commented 7 months ago

torvalds.dev is analyzing the ticket

torvalds-dev commented 7 months ago

It seems like you're trying to process both text and tables from your data using Fusion_RAG and Semi_Structured_RAG. Fusion_RAG is a model that is good at processing text data, while Semi_Structured_RAG is designed to handle semi-structured data like tables.

To combine these two models, you might need to create a pipeline that first identifies whether a piece of data is text or a table, then applies the appropriate model.

Here's a high-level plan:

  1. Identify the type of data (text or table).
  2. If the data is text, use Fusion_RAG to process it.
  3. If the data is a table, use Semi_Structured_RAG to process it.
  4. Combine the results from both models for further analysis or usage.

The file llama_index/node_parser/__init__.py seems to be relevant to your issue. It contains various parsers for different types of data, including text and JSON. You might be able to use these parsers to help identify the type of data you're dealing with.

Please note that this is a high-level plan and the actual implementation might vary depending on the specifics of your data and the capabilities of the Fusion_RAG and Semi_Structured_RAG models.