parthsarthi03 / raptor

The official implementation of RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval
https://arxiv.org/abs/2401.18059
MIT License
687 stars 98 forks source link

Bug in chunk splitting #38

Open Giustino98 opened 1 month ago

Giustino98 commented 1 month ago

I have been struggling with method RA.add_documents() with Azure OpenAI since it returned: Error: 400 ‘$.input’ is invalid. I found the problem was due to split_text() function in utils.py since it, in some situations, could return empty chunk which would give error when sent to openai. To fix this, I just added the following line of code: chunks = [chunk for chunk in chunks if chunk.strip()] in build_from_text method of tree_builder.py

parthsarthi03 commented 1 month ago

Thank you for reporting the bug! If possible, can you file a PR to fix it.