turboderp / exllamav2

A fast inference library for running LLMs locally on modern consumer-class GPUs
MIT License
3.69k stars 283 forks source link

[REQUEST] "Antislop" sampler #640

Closed Downtown-Case closed 1 month ago

Downtown-Case commented 1 month ago

Problem

Taken from the project's readme:

Language models are trained on vast corpora of text, which often leads them to overproduce certain words or phrases that are statistically more frequent in the training data. This can result in outputs that are:

Solution

https://github.com/sam-paech/antislop-sampler

The AntiSlop sampler tackles this problem by implementing a dynamic token adjustment system that:

Alternatives

There's a lot of fuss around trying to "randomize" token generation in just the right way (temperature, XTC, skew, quadratic and such).

Explanation

...But this seems to actually attack the root of the problem, the LLM repeating itself from its own context and generating "slop"

Examples

https://old.reddit.com/r/LocalLLaMA/comments/1fqqez5/i_made_a_configurable_antislop_sampler_which/

https://colab.research.google.com/drive/11TjqWQCZ8OJBV6Yi2XI1CsOLx0OTtu0t?usp=sharing

Additional context

Of course this is a huge ask, as the sampler interrupts generation and "backtracks" when necessary. It might be totally impractical for exllama, or only practical for a subset of it (eg non batched generation).

Acknowledgements

turboderp commented 1 month ago

This has been in ExLlama for over four months. Provide a list of strings of arbitrary length, and whenever the start of such a string is encountered, the output is suppressed until the match is resolved. If the string doesn't end up matching, the held output is emitted as normal. But if there is a (case insensitive) match to a banned string the generator rolls back and resamples at that point while suppressing the token that caused the generation to start going down the path that lead to the match. Strings don't have to align to token boundaries.

Example script here shows how it can be used to decensor Llama-3: (but you can give it examples of cliche phrases just as easily)

image

The feature is also exposed in TabbyAPI and ExUI.

Downtown-Case commented 1 month ago

Oh perfect. I always use exui's notebook mode, so I didn't even see it there, but I can hack it in.

I also misread the implementation, I thought it added over-represented words to the .json list as it went on, but this is not the case.

Thanks!