This PR introduces a fix that should address overloading of the context window. This is done by limiting the immediate history accessible by Workers AI to 10 messages.
In the future, we may be able to simulate an "unlimited" context window by use of function calling, which should allow us to pull back context from a specific message ID if SpongeChat needs it. (this would be freaking OP)
Why is this useful?
With passive activation, SpongeChat's memories (context window) become increasingly polluted over time as more memories are entered. Over time, the model slows down and response quality decreases until it eventually stops producing any sort of response.
Addressing long-term context window overloads will keep the model's speed relatively fast and a full slowdown should be avoided. While short-term overloads (i.e. deliberate prompts that destroy the context window and generally-dense queries) are not completely avoided, "amnesia" recovery (introduced in #60) addresses this issue by automatically resetting the context window after two empty responses.
ref: #62
Description
This PR introduces a fix that should address overloading of the context window. This is done by limiting the immediate history accessible by Workers AI to 10 messages.
In the future, we may be able to simulate an "unlimited" context window by use of function calling, which should allow us to pull back context from a specific message ID if SpongeChat needs it. (this would be freaking OP)
Why is this useful?
With passive activation, SpongeChat's memories (context window) become increasingly polluted over time as more memories are entered. Over time, the model slows down and response quality decreases until it eventually stops producing any sort of response.
Addressing long-term context window overloads will keep the model's speed relatively fast and a full slowdown should be avoided. While short-term overloads (i.e. deliberate prompts that destroy the context window and generally-dense queries) are not completely avoided, "amnesia" recovery (introduced in #60) addresses this issue by automatically resetting the context window after two empty responses.