[REQUEST] "Antislop" sampler

Downtown-Case commented 1 month ago

Problem

Taken from the project's readme:

Language models are trained on vast corpora of text, which often leads them to overproduce certain words or phrases that are statistically more frequent in the training data. This can result in outputs that are:

Repetitive: Frequently using the same expressions.
Predictable: Lacking originality due to overreliance on common phrases.
Less Engaging: Failing to capture the reader's interest with fresh language.

Solution

https://github.com/sam-paech/antislop-sampler

The AntiSlop sampler tackles this problem by implementing a dynamic token adjustment system that:

Monitors the generated tokens in real-time.
Detects when an overrepresented word or phrase is about to be generated.
Adjusts the model's output probabilities to discourage the generation of these overused expressions.
Allows for controlled backtracking to revise the output when necessary.

Alternatives

There's a lot of fuss around trying to "randomize" token generation in just the right way (temperature, XTC, skew, quadratic and such).

Explanation

...But this seems to actually attack the root of the problem, the LLM repeating itself from its own context and generating "slop"

Examples

https://old.reddit.com/r/LocalLLaMA/comments/1fqqez5/i_made_a_configurable_antislop_sampler_which/

https://colab.research.google.com/drive/11TjqWQCZ8OJBV6Yi2XI1CsOLx0OTtu0t?usp=sharing

Additional context

Of course this is a huge ask, as the sampler interrupts generation and "backtracks" when necessary. It might be totally impractical for exllama, or only practical for a subset of it (eg non batched generation).

Acknowledgements

[X] I have looked for similar requests before submitting this one.
[X] I understand that the developers have lives and my issue will be answered when possible.
[X] I understand the developers of this program are human, and I will make my requests politely.

turboderp commented 1 month ago

This has been in ExLlama for over four months. Provide a list of strings of arbitrary length, and whenever the start of such a string is encountered, the output is suppressed until the match is resolved. If the string doesn't end up matching, the held output is emitted as normal. But if there is a (case insensitive) match to a banned string the generator rolls back and resamples at that point while suppressing the token that caused the generation to start going down the path that lead to the match. Strings don't have to align to token boundaries.

Example script here shows how it can be used to decensor Llama-3: (but you can give it examples of cliche phrases just as easily)

The feature is also exposed in TabbyAPI and ExUI.

Downtown-Case commented 1 month ago

Oh perfect. I always use exui's notebook mode, so I didn't even see it there, but I can hack it in.

I also misread the implementation, I thought it added over-represented words to the .json list as it went on, but this is not the case.

Thanks!

turboderp / exllamav2