oobabooga / text-generation-webui

A Gradio web UI for Large Language Models.
GNU Affero General Public License v3.0
39.68k stars 5.21k forks source link

Context Shifting #4588

Closed tmsingson closed 9 months ago

tmsingson commented 10 months ago

Description

About 10 days ago, KoboldCpp added a feature called Context Shifting which is supposed to greatly reduce reprocessing. Here is their official description of the feature:

NEW FEATURE: Context Shifting (A.K.A. EvenSmarterContext) - This feature utilizes KV cache shifting to automatically remove old tokens from context and add new ones without requiring any reprocessing. So long as you use no memory/fixed memory and don't use world info, you should be able to avoid almost all reprocessing between consecutive generations even at max context. This does not consume any additional context space, making it superior to SmartContext.

Any chance this gets added to Ooba as well?

Additional Context

Reddit thread: https://www.reddit.com/r/LocalLLaMA/comments/17ni4hm/koboldcpp_v148_context_shifting_massively_reduced/ llama.cpp pull: https://github.com/ggerganov/llama.cpp/pull/3228 kobold.cpp 1.48.1 release: https://github.com/LostRuins/koboldcpp/releases/tag/v1.48.1

github-actions[bot] commented 9 months ago

This issue has been closed due to inactivity for 6 weeks. If you believe it is still relevant, please leave a comment below. You can tag a developer in your comment.

kisenera commented 8 months ago

Is there any way to get this for Exl2?

aarongerber commented 8 months ago

This was closed as stale. Did it ever get implemented @oobabooga? Literally this is driving me to use KoboldCpp. As soon as you hit context limits in Oobabooga it becomes obnoxious in comparison. :/

RichardFevrier commented 5 months ago

Is there any way to get this for Exl2?

Wish too know too

aarongerber commented 5 months ago

Thanks @oobabooga you rock