parthsarthi03 / raptor

The official implementation of RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval
https://arxiv.org/abs/2401.18059
MIT License
687 stars 98 forks source link

Sorry! We've encountered an issue with repetitive patterns in your prompt. Please try again with a different prompt #8

Closed younes-io closed 3 months ago

younes-io commented 4 months ago

I wanted to try RAPTOR on some .txt files that I have


Context to summarize:
 .

. // VERY BIG NUMBER OF LINES WITH THESE DOTS AND THAT'S IT (I reduced the number here)

.

.

.

2024-03-08 18:43:52,574 - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 400 Bad Request"
Error code: 400 - {'error': {'message': "Sorry! We've encountered an issue with repetitive patterns in your prompt. Please try again with a different prompt.", 'type': 'invalid_request_error', 'param': 'prompt', 'code': 'invalid_prompt'}}

Is RAPTOR creating a cluster of punctuation (dots) in this case? Because my docs don't have dots like this..

parthsarthi03 commented 4 months ago

Hey! I think this might be due to an issue with the text splitting while creating the leaf nodes. Can you look at the leaf nodes by doing the following and checking if you have a lot of chunks with just dots in them.

for key, node in RA.tree.leaf_nodes.items():
    print(key, node.text[:50])
younes-io commented 4 months ago

@parthsarthi03 Alright, thanks. I will try that when it finishes. I have a lot of chunks.

BTW, I got this prompt... I left the script running but it halted on this prompted and waiting for the answer... it's not very practical.. should I open an issue for this ?

image

parthsarthi03 commented 4 months ago

It's a precautionary check since we don't want people to accidentally overwrite their previously built trees. You should enter 'n' and it'll continue building the tree.

parthsarthi03 commented 4 months ago

Also, to quickly build the leaf layer and a single dummy layer on top without building the entire actual RAPTOR tree, you can do the following.


from raptor import RetrievalAugmentation, RetrievalAugmentationConfig, BaseSummarizationModel, BaseEmbeddingModel
import numpy as np

# test summarization model
class TestSummarizationModel(BaseSummarizationModel):
    def __init__(self):
       pass
    def summarize(self, context, max_tokens=150):
        return "This is a summary"

# test embedding model
class TestEmbeddingModel(BaseEmbeddingModel):
    def __init__(self):
       pass
    def create_embedding(self, text):
        return np.asarray([3, 1, 4, 1, 5, 9])

RAC = RetrievalAugmentationConfig(tb_num_layers=1, summarization_model=TestSummarizationModel(), embedding_model=TestEmbeddingModel())
RA = RetrievalAugmentation(config=RAC)

RA.add_documents(text)
younes-io commented 4 months ago

Hey! I think this might be due to an issue with the text splitting while creating the leaf nodes. Can you look at the leaf nodes by doing the following and checking if you have a lot of chunks with just dots in them.

for key, node in RA.tree.leaf_nodes.items():
    print(key, node.text[:50])

I ran this now as it finished:

for key, node in RA.tree.leaf_nodes.items():
    if node.text == ".":
        print(node.text)

and the output is as below (there are more lines like this but I don't want to copy/paste everything here:

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
parthsarthi03 commented 4 months ago

Yes, there might be an issue with the text splitting then, can you please provide me with a sample of the text document you are using, and I'll try to reproduce the issue.

younes-io commented 4 months ago

@parthsarthi03 here you go https://gist.githubusercontent.com/younes-io/d471f38313c10a3f766787d87e3b3f85/raw/50ff6d90ba8ca0ba9b1d8a359529f74332e51a13/text_raptor.txt

parthsarthi03 commented 4 months ago

This should be now solved with https://github.com/parthsarthi03/raptor/commit/c70344659d8b87ff719a4fc88195ee9d6b8e43e3.

younes-io commented 3 months ago

Thank you @parthsarthi03 I'll be testing this

parthsarthi03 commented 3 months ago

Closing this issue for now. If you have any further questions or encounter additional issues, please feel free to reopen it.