parthsarthi03 / raptor

The official implementation of RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval
https://arxiv.org/abs/2401.18059
MIT License
688 stars 98 forks source link

Constructing Layer Issue #23

Closed medxiaorudan closed 3 months ago

medxiaorudan commented 3 months ago

Hello, I have different length of samples, the num_layers range from 17 to 51, but all of them "Stopping Layer construction: Cannot Create More Layers. Total Layers in tree: 2", it seems I can't build a tree more than 2 layers.

parthsarthi03 commented 3 months ago

Hey! How long are your texts? The algorithm stops the layer construction if it cannot cluster anymore.

medxiaorudan commented 3 months ago

Hey! How long are your texts? The algorithm stops the layer construction if it cannot cluster anymore.

Thanks for your reply, my texts were retrieved from Pinecone vector database and they are of different lengths ranging from 766 to 6047 tokens. All of them stop the layer construction at layer 2.

parthsarthi03 commented 3 months ago

At that length, two layers would be standard. We're coming out with new clustering methods soon that will allow you to have more explicit control over the number of layers. Meanwhile, if you want more layers, you could decrease your chunk token length. You can do that by

RAC = RetrievalAugmentationConfig(tb_max_tokens=50) 
RA = RetrievalAugmentation(config=RAC)
RA.add_documents(text)

here the chunk size is a maximum of 50 tokens (by default it is 100).