Closed jon-chuang closed 9 months ago
I would like to work on this. To clarify, TopAndContext should iterate through all the nodes and return three consecutive sentences where the middle one is the most relevant (for before=1 and after=1 case)?
Yes, that is correct.
The more general form is TopKAndContext(before,after,top_k)
.
Btw @logan-markewich , don't you think that the name is a little bit strange? Is it not more NodeLengthOptimizer
or NodeContextOptimizer
?
The name is definitely a little weird haha we could change it (assuming we keep some reference to the old name to not break peoples code)
Yeah, we can do bw compat but use new name, similar to GPTVectorIndex -> VectorIndex
My preference is to use cheaper (and hopefully faster) reranking methods, and also to create notebook showing how much faster it is to perform downstream tasks due to the reduced context window size without sacrificing much on accuracy (we need a RAG benchmark in this case, not just BEIR).
(Quite excited to do this. Can also help with factoring out the ranking part of ColBERT from the compression/indexing)
added pull request #7730 to add before/after context. I was not able to obtain the separator(' ' / '.') applied to split nodes, the returned text is by default separated by period ". ".
Feature Description
PruningMode
:TopAndContext(before=1, after=1)
: returns the most relevant sentence +before, after
sentences from before and after.RelevanceMode
: