run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
36.96k stars 5.3k forks source link

[Feature Request]: Superposition Prompting #15644

Open Jeevi10 opened 3 months ago

Jeevi10 commented 3 months ago

Feature Description

Despite the successes of large language models (LLMs), they exhibit significant drawbacks, particularly when processing long contexts. Their inference cost scales quadratically with respect to sequence length, making it expensive for deployment in some real-world text processing applications, such as retrieval-augmented generation (RAG). Additionally, LLMs also exhibit the "distraction phenomenon", where irrelevant context in the prompt degrades output quality. To address these drawbacks, we propose a novel RAG prompting methodology, superposition prompting, which can be directly applied to pre-trained transformer-based LLMs without the need for fine-tuning. At a high level, superposition prompting allows the LLM to process input documents in parallel prompt paths, discarding paths once they are deemed irrelevant. We demonstrate the capability of our method to simultaneously enhance time efficiency across a variety of question-answering benchmarks using multiple pre-trained LLMs. Furthermore, our technique significantly improves accuracy when the retrieved context is large relative the context the model was trained on. For example, our approach facilitates a 93x reduction in compute time while improving accuracy by 43% on the NaturalQuestions-Open dataset with the MPT-7B instruction-tuned model over naive RAG.

https://arxiv.org/abs/2404.06910 https://github.com/apple/ml-superposition-prompting?tab=readme-ov-file

Reason

Quadratic scaling of inference cost: The inference cost of LLMs scales quadratically with respect to sequence length. This makes deployment expensive for real-world text processing applications, especially those involving long contexts.

The "distraction phenomenon": LLMs suffer from a problem where irrelevant context in the prompt degrades the output quality. This suggests that LLMs can be sensitive to noise or irrelevant information in the input, potentially leading to lower quality outputs.

Value of Feature

Advantages Improved Efficiency: Demonstrates significant reduction in compute time across various question-answering benchmarks. Enhanced Accuracy: Particularly effective when the retrieved context is large relative to the model's training context. Versatility: Applicable to multiple pre-trained LLMs.

Case Study: NaturalQuestions-Open Dataset Using the MPT-7B instruction-tuned model:

93x reduction in compute time 43% improvement in accuracy

Implications This methodology addresses two critical challenges in LLM deployment: The computational cost of processing long contexts The negative impact of irrelevant information on output quality

dosubot[bot] commented 11 hours ago

Hi, @Jeevi10. I'm Dosu, and I'm helping the LlamaIndex team manage their backlog. I'm marking this issue as stale.

Issue Summary:

Next Steps:

Thank you for your understanding and contribution!