Closed Downtown-Case closed 1 month ago
This has been in ExLlama for over four months. Provide a list of strings of arbitrary length, and whenever the start of such a string is encountered, the output is suppressed until the match is resolved. If the string doesn't end up matching, the held output is emitted as normal. But if there is a (case insensitive) match to a banned string the generator rolls back and resamples at that point while suppressing the token that caused the generation to start going down the path that lead to the match. Strings don't have to align to token boundaries.
Example script here shows how it can be used to decensor Llama-3: (but you can give it examples of cliche phrases just as easily)
The feature is also exposed in TabbyAPI and ExUI.
Oh perfect. I always use exui's notebook mode, so I didn't even see it there, but I can hack it in.
I also misread the implementation, I thought it added over-represented words to the .json list as it went on, but this is not the case.
Thanks!
Problem
Taken from the project's readme:
Language models are trained on vast corpora of text, which often leads them to overproduce certain words or phrases that are statistically more frequent in the training data. This can result in outputs that are:
Solution
https://github.com/sam-paech/antislop-sampler
The AntiSlop sampler tackles this problem by implementing a dynamic token adjustment system that:
Alternatives
There's a lot of fuss around trying to "randomize" token generation in just the right way (temperature, XTC, skew, quadratic and such).
Explanation
...But this seems to actually attack the root of the problem, the LLM repeating itself from its own context and generating "slop"
Examples
https://old.reddit.com/r/LocalLLaMA/comments/1fqqez5/i_made_a_configurable_antislop_sampler_which/
https://colab.research.google.com/drive/11TjqWQCZ8OJBV6Yi2XI1CsOLx0OTtu0t?usp=sharing
Additional context
Of course this is a huge ask, as the sampler interrupts generation and "backtracks" when necessary. It might be totally impractical for exllama, or only practical for a subset of it (eg non batched generation).
Acknowledgements