parthsarthi03 / raptor

The official implementation of RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval
https://arxiv.org/abs/2401.18059
MIT License
687 stars 98 forks source link

num_layers acts like max_num_layers #29

Open manuoliveira6 opened 3 months ago

manuoliveira6 commented 3 months ago

I am trying to build a tree setting num_layers=3. When I access to num_layers with RA.tree.num_layers, the value 1 is given, so the three has only one layer.

It seems this parameter is set as a max_num_layers, so if some condition is met, this num_layers is not reached by the tree.

The code used is the following:

from openai import OpenAI
from tenacity import retry, stop_after_attempt, wait_random_exponential

class OpenAIEmbeddingModel(BaseEmbeddingModel):
    def __init__(self, model='text-embedding-3-small'):
        self.client = OpenAI()
        self.model = model

    @retry(wait=wait_random_exponential(min=1, max=20), stop=stop_after_attempt(6))
    def create_embedding(self, text):
        text = text.replace("\n", " ")
        return (
            self.client.embeddings.create(input=[text], model=self.model)
            .data[0]
            .embedding
        )

RAC = RetrievalAugmentationConfig(tb_num_layers=3, 
                                  tb_selection_mode='threshold', 
                                  tb_threshold=0.3,
                                  tb_summarization_length=250,
                                  embedding_model=OpenAIEmbeddingModel())
parthsarthi03 commented 3 months ago

Just, so I understand correctly, the issue is that you are setting the number of layers to 3, but the tree has only one layer formed. How many tokens is the text you are building the tree with?

manuoliveira6 commented 3 months ago

The text is sample.txt, the text given in your repository.