Addition of DRY: A modern repetition penalty that reliably prevents looping

awtrisk commented 1 month ago

Would it be worth it to add DRY as an alternative to the traditional repetition penalty? Users have reported that it actually works, and the PR on the ooba repo itself seems to be solid. It also has a llama.cpp PR. There seem to be barely any downsides to it too.

If it seems good, I can make the PR and implement it here.

turboderp commented 1 month ago

As far as I can tell it's basically just an n-gram penalty, but without combining it with a beam search it doesn't really solve offer a way to discourage repetitions before they occur. I.e. the model is allowed to start down the path of a repetition, and it's only somewhere along that path that the penalty kicks in, at which point it's impossible to turn back.

So I'm not too sure about it. Are there any thorough comparisons to other methods like increased temperature, skew, frequency penalty etc.?

awtrisk commented 1 month ago

AFAIK I don't think this was meant to discourage against repetition, but instead that when a pattern of repetition occurs, it can quickly cull it by biasing against the mean repeated tokens. Imo this is better than the current ways of preventing repetition we have.

@p-e-w may be able to shed more insight on things like comparisons, although I will be testing it with other samplers.

p-e-w commented 1 month ago

DRY is indeed an n-gram/sequence penalty, but it works a little differently from no_repeat_ngram_size and other proposals I've seen. The differences can be summarized as follows:

The penalty grows smoothly with the length of the repeated sequence, preventing garbage from being generated in situations where extending a repetition is mandated by the context and no_repeat_ngram_size and its ilk just slam the door.
The penalty grows exponentially with the length of the repeated sequence, guaranteeing that the model's tendency to loop is eventually overcome. Many models, when presented with a partially repeated sequence, will overwhelmingly predict continuing the repetition, so slower-growing penalties can be insufficient.
The "sequence breakers" mechanism protects the structure of chat/instruction templates from being penalized, allowing much stronger penalties to be used without negative effects. I have extensively tested this in chat scenarios.

Simply put, it works. I and others have been running DRY for over two months now, and it's such a massive improvement over traditional repetition penalties that I can't imagine going back. Looping is a scourge, and the existing penalties are a cure that's almost worse than the disease, being noticeably detrimental to output quality. DRY is far better than the three flavors of RepPen at actually preventing repetition, while leaving standard sentence structure completely unaffected.

All samplers are hacks by definition (we should be able to just use the distribution from the model as-is). DRY was developed not primarily from theoretical considerations, but guided by constant real-world experimentation. Having generated and examined probably in excess of 200k tokens in well over 100 contexts by now using DRY, I can confidently say that it works, and enables results that cannot be replicated using any combination of the widely available samplers of today.

yamosin commented 1 month ago

Really looking forward to seeing it implemented on TabbyAPI

AgeOfAlgorithms commented 3 weeks ago

bump

Vhallo commented 2 weeks ago

The performance issues have been solved by now thanks to belladoreai, so might be worthwhile to integrate this now.

AgeOfAlgorithms commented 2 weeks ago

I just wanted to bring this comment by @belladoreai here for eveyone's convenience. It gives another good reason why no_repeat_ngram_size is unsuitable for stopping repetition. This was from their discussion with @p-e-w

For what it's worth, I've done a lot of experimentation with no_repeat_ngram_size in the past and I can confirm it's fairly useless in a chat context. It might be useful in other contexts, especially in contexts where the input is relatively small. But when a chat message history grows, using no_repeat_ngram_size typically leads to situations where the model is intentionally writing broken english (like writing "engglish" instead of "english"), where the brokenness of the language just grows more and more absurd over time. This seems to happen because in many cases (especially with smaller models) the model perceives repetitive output to be extremely likely - so likely, that even broken versions of the repetitive output appear more likely than some other alternative continuation of the text. So when we prevent the model from generating the exact same repetitive continuation to the text, it chooses to use a broken alternative version of the same repetitive text instead of choosing some more natural text.

I do not recommend using no_repeat_ngram_size except at very high values, if no other "circuit breaker" for repetition exists.

turboderp / exllamav2

Addition of DRY: A modern repetition penalty that reliably prevents looping #447