Closed younes-io closed 3 months ago
Hey! I think this might be due to an issue with the text splitting while creating the leaf nodes. Can you look at the leaf nodes by doing the following and checking if you have a lot of chunks with just dots in them.
for key, node in RA.tree.leaf_nodes.items():
print(key, node.text[:50])
@parthsarthi03 Alright, thanks. I will try that when it finishes. I have a lot of chunks.
BTW, I got this prompt... I left the script running but it halted on this prompted and waiting for the answer... it's not very practical.. should I open an issue for this ?
It's a precautionary check since we don't want people to accidentally overwrite their previously built trees. You should enter 'n' and it'll continue building the tree.
Also, to quickly build the leaf layer and a single dummy layer on top without building the entire actual RAPTOR tree, you can do the following.
from raptor import RetrievalAugmentation, RetrievalAugmentationConfig, BaseSummarizationModel, BaseEmbeddingModel
import numpy as np
# test summarization model
class TestSummarizationModel(BaseSummarizationModel):
def __init__(self):
pass
def summarize(self, context, max_tokens=150):
return "This is a summary"
# test embedding model
class TestEmbeddingModel(BaseEmbeddingModel):
def __init__(self):
pass
def create_embedding(self, text):
return np.asarray([3, 1, 4, 1, 5, 9])
RAC = RetrievalAugmentationConfig(tb_num_layers=1, summarization_model=TestSummarizationModel(), embedding_model=TestEmbeddingModel())
RA = RetrievalAugmentation(config=RAC)
RA.add_documents(text)
Hey! I think this might be due to an issue with the text splitting while creating the leaf nodes. Can you look at the leaf nodes by doing the following and checking if you have a lot of chunks with just dots in them.
for key, node in RA.tree.leaf_nodes.items(): print(key, node.text[:50])
I ran this now as it finished:
for key, node in RA.tree.leaf_nodes.items():
if node.text == ".":
print(node.text)
and the output is as below (there are more lines like this but I don't want to copy/paste everything here:
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Yes, there might be an issue with the text splitting then, can you please provide me with a sample of the text document you are using, and I'll try to reproduce the issue.
This should be now solved with https://github.com/parthsarthi03/raptor/commit/c70344659d8b87ff719a4fc88195ee9d6b8e43e3.
Thank you @parthsarthi03 I'll be testing this
Closing this issue for now. If you have any further questions or encounter additional issues, please feel free to reopen it.
I wanted to try RAPTOR on some
.txt
files that I haveIs RAPTOR creating a cluster of punctuation (dots) in this case? Because my docs don't have dots like this..