run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
35.81k stars 5.07k forks source link

[Question]: Retriever + Summariser #12550

Closed Aekansh-Ak closed 2 months ago

Aekansh-Ak commented 6 months ago

Question Validation

Question

Let's say, I have huge amount of data and I want to summarize all the activities by a specific user.

If I use a retriever query engine with a prompt to summarize, it just returns about 1 or 2 activities.

Can I try this for my task -: https://docs.llamaindex.ai/en/stable/examples/query_engine/JointQASummary/

If I can, in the example it doesn't return anything.

dosubot[bot] commented 6 months ago

Hey @Aekansh-Ak, great to see you back! I hope you're doing well.

Yes, you can use the method described in the JointQASummary example from the LlamaIndex documentation to summarize all activities by a specific user. However, you need to ensure that the documents you are passing into the build_from_documents method contain the activities of the specific user you are interested in.

Here's how you can adjust the method:

  1. Filter Documents by User: Before passing documents to the build_from_documents method, filter them to only include those that are related to the specific user whose activities you want to summarize. This filtering process depends on how user information is stored within your documents. For example, if each document has a user_id field, you can filter documents like this:
user_specific_documents = [doc for doc in all_documents if doc.user_id == specific_user_id]
  1. Adjust the Query Engine for User-specific Summarization: After filtering the documents, pass the user_specific_documents to the build_from_documents method. This method will then only consider the activities of the specific user for summarization and QA tasks.

Here's an adjusted snippet incorporating the user-specific document filtering:

# Assuming `all_documents` is a list of Document objects and `specific_user_id` is the ID of the user you're interested in
user_specific_documents = [doc for doc in all_documents if doc.user_id == specific_user_id]

# Now, build the query engine with these user-specific documents
query_engine = QASummaryQueryEngineBuilder().build_from_documents(user_specific_documents)

Please note, the code snippet above assumes that your Document objects have a user_id attribute that can be used to filter documents by user. You might need to adjust the filtering logic based on how your documents are structured and how user information is stored within them.

This approach ensures that the query engine built by QASummaryQueryEngineBuilder is specifically tailored for summarizing and querying activities related to a specific user. You can find more details about this in the source code.

Regarding the Document object, it is structured as a Python class with two attributes: id and text. The id is a string that uniquely identifies the document, and text is a string that contains the text of the document. You can find more details about this in the source code.

I hope this helps! If you have any more questions or need further clarification, feel free to ask.

Sources

#### About Dosu This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot.