run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
33.11k stars 4.61k forks source link

[Question]: How do I access a raptor pack variable that I created later on? #13540

Open sohilbhatia opened 1 month ago

sohilbhatia commented 1 month ago

Question Validation

Question

How do I access the raptor_pack variable later on. I am using this in a flask application but I want to call the query engine by using raptor_pack.retriever as we noticed that produced better results. Trying to figure out a way to do this without creating a retriever and passing that into the arguments of RetrieverQueryEngine, and without cookies/global vars etc.

def createDepRaptors(path, document):
    client = chromadb.PersistentClient(path=f"./{path}")
    collection = client.get_or_create_collection(f"{path}")
    documents = document
    vector_store = ChromaVectorStore(chroma_collection=collection)

    raptor_pack = RaptorPack(
        documents,
        embed_model=OpenAIEmbedding(
            model="text-embedding-3-small"
        ),  # used for embedding clusters
        llm=OpenAI(model="gpt-3.5-turbo", temperature=0.1),  # used for generating summaries
        vector_store=vector_store,  # used for storage
        similarity_top_k=8,  # top k for each layer, or overall top-k for collapsed
        mode="collapsed",  # sets default mode
        transformations=[
            SentenceSplitter(chunk_size=400, chunk_overlap=50)
        ],  # transformations applied for ingestion
    )

    query_engine = RetrieverQueryEngine.from_args(raptor_pack.retriever, llm=OpenAI(model="gpt-3.5-turbo", 
     temperature=0.1))
dosubot[bot] commented 1 month ago

Hey there, @sohilbhatia! Fancy seeing you here again. 😄 Looks like you've got another intriguing puzzle for us. Let me dive into the depths of raptor_pack wisdom and I'll get back to you with something useful shortly.

logan-markewich commented 1 month ago

Assuming you already created the index once, you can do

raptor_pack = RaptorPack(
       [],
        embed_model=OpenAIEmbedding(
            model="text-embedding-3-small"
        ),  # used for embedding clusters
        llm=OpenAI(model="gpt-3.5-turbo", temperature=0.1),  # used for generating summaries
        vector_store=vector_store,  # used for storage
        similarity_top_k=8,  # top k for each layer, or overall top-k for collapsed
        mode="collapsed",  # sets default mode
        transformations=[
            SentenceSplitter(chunk_size=400, chunk_overlap=50)
        ],  # transformations applied for ingestion
    )

query_engine = RetrieverQueryEngine.from_args(raptor_pack.retriever, llm=OpenAI(model="gpt-3.5-turbo", 
     temperature=0.1))

Notice an empty list was passed in for documents. This should basically end up being a no-op, so its fine to run this whenever needed

If you wanted, you can also import and use the RaptorRetriever directly too

from llama_index.packs.raptor import RaptorRetriever

retriever = RaptorRetriever(
   [],
    embed_model=embed_model,
    llm=llm,
    similarity_top_k=similarity_top_k,
    vector_store=vector_store,
    mode=mode,
    verbose=verbose,
    transformations=transformations,
)
dosubot[bot] commented 1 month ago

To access and utilize the raptor_pack variable in your Flask application without re-creating indexes or using global variables, consider storing raptor_pack within the Flask application context. This method ensures raptor_pack is initialized once and accessible across your application. Here's how you can implement this:

  1. Initialize raptor_pack during Flask app startup:
    
    from flask import Flask
    from llama_index.packs.raptor import RaptorPack

app = Flask(name)

with app.app_context(): app.raptor_pack = RaptorPack()


2. **Access `raptor_pack` in your route or view function:**
```python
from flask import current_app

@app.route('/use_raptor')
def use_raptor():
    raptor_pack = current_app.raptor_pack
    # Now you can use raptor_pack here
    return "Raptor Pack Accessed"

This approach leverages Flask's app_context() to store the raptor_pack instance, making it available throughout the application lifecycle without the need for global variables or passing the instance around.

For more detailed guidance, refer to the Flask documentation on application contexts.

If you're considering using RaptorRetriever directly for more control over the indexing and retrieval process, remember that RaptorPack provides a simplified, high-level interface for most use cases. However, direct use of RaptorRetriever might be beneficial if you require specific customizations beyond what RaptorPack offers.

#### Details **Sources** - [llama-index-packs/llama-index-packs-raptor/llama_index/packs/raptor/__init__.py](https://github.com/run-llama/llama_index/blob/main/llama-index-packs/llama-index-packs-raptor/llama_index/packs/raptor/__init__.py) - [llama-index-packs/llama-index-packs-raptor/tests/test_packs_raptor.py](https://github.com/run-llama/llama_index/blob/main/llama-index-packs/llama-index-packs-raptor/tests/test_packs_raptor.py) - [llama-index-packs/llama-index-packs-raptor/llama_index/packs/raptor/base.py](https://github.com/run-llama/llama_index/blob/main/llama-index-packs/llama-index-packs-raptor/llama_index/packs/raptor/base.py) - [llama-index-core/llama_index/core/node_parser/text/sentence.py](https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/node_parser/text/sentence.py)
**About Dosu** This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.