run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
33.36k stars 4.67k forks source link

[Feature Request]: Add more modes to `SentenceEmbeddingOptimizer` #6842

Closed jon-chuang closed 9 months ago

jon-chuang commented 12 months ago

Feature Description

PruningMode:

RelevanceMode:

JainVidit12 commented 12 months ago

I would like to work on this. To clarify, TopAndContext should iterate through all the nodes and return three consecutive sentences where the middle one is the most relevant (for before=1 and after=1 case)?

jon-chuang commented 12 months ago

Yes, that is correct.

The more general form is TopKAndContext(before,after,top_k).

jon-chuang commented 12 months ago

Btw @logan-markewich , don't you think that the name is a little bit strange? Is it not more NodeLengthOptimizer or NodeContextOptimizer?

logan-markewich commented 12 months ago

The name is definitely a little weird haha we could change it (assuming we keep some reference to the old name to not break peoples code)

jon-chuang commented 12 months ago

Yeah, we can do bw compat but use new name, similar to GPTVectorIndex -> VectorIndex

jon-chuang commented 12 months ago

My preference is to use cheaper (and hopefully faster) reranking methods, and also to create notebook showing how much faster it is to perform downstream tasks due to the reduced context window size without sacrificing much on accuracy (we need a RAG benchmark in this case, not just BEIR).

(Quite excited to do this. Can also help with factoring out the ranking part of ColBERT from the compression/indexing)

JainVidit12 commented 9 months ago

added pull request #7730 to add before/after context. I was not able to obtain the separator(' ' / '.') applied to split nodes, the returned text is by default separated by period ". ".