parthsarthi03 / raptor

The official implementation of RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval
https://arxiv.org/abs/2401.18059
MIT License
688 stars 98 forks source link

Adding new Document to the existing RAPTOR setup #26

Closed akesh1235 closed 3 months ago

akesh1235 commented 3 months ago

The RAPTOR looks interesting but I see a big limitation in case one wants to incrementally add information to a vectorstore (quite common in a production scenarios). Raptor only works by looking globally at the entire pool of documents, as summaries are iteratively computed on clusters. This produces a sort of "immutable" vectorstore. In other words, if a user wants to simply add a document to an existing vectorstore, the full Raptor pipeline would have to run again to take into account the new information in existing summaries, which may become quite expensive with many documents (both in terms of cost and latency of the operation). Maybe one could simply replace the most similar summary at each level? I'd love to hear how people will address this.