The RAPTOR looks interesting but I see a big limitation in case one wants to incrementally add information to a vectorstore (quite common in a production scenarios). Raptor only works by looking globally at the entire pool of documents, as summaries are iteratively computed on clusters. This produces a sort of "immutable" vectorstore. In other words, if a user wants to simply add a document to an existing vectorstore, the full Raptor pipeline would have to run again to take into account the new information in existing summaries, which may become quite expensive with many documents (both in terms of cost and latency of the operation). Maybe one could simply replace the most similar summary at each level? I'd love to hear how people will address this.
The RAPTOR looks interesting but I see a
big limitation
in case one wants to incrementally add information to a vectorstore (quite common in a production scenarios). Raptor only works by looking globally at the entire pool of documents, as summaries are iteratively computed on clusters. This produces a sort of"immutable" vectorstore
. In other words, if a user wants to simply add a document to an existing vectorstore, the full Raptor pipeline would have to run again to take into account the new information in existing summaries, which may become quite expensive with many documents (both in terms of cost and latency of the operation). Maybe one could simply replace the most similar summary at each level? I'd love to hear how people will address this.