[REQUEST] Modify strings probability, rather than outright banning with banned_strings

atisharma commented 1 month ago

Problem

banned_strings is fantastic, but a bit blunt when it comes to outright banning single words. A good list, for example, is antislop. Banning single over-represented words can be problematic, for example completely banning "punctuated" when discussing grammar.

Solution

What I propose is that it should be possible to adjust the probability of particular strings to non-zero level (instead of banning).

Alternatives

One possible naive implementation that need not change the sampling approach, would be to keep the current banning implementation but remove words from a dynamic banlist with certain probability.

For example, if you wish reduce the frequency of "concoctions" by half, you could specify ["concoctions", 0.7] in a avoided_words section of the sampler overrides yaml file, and then tabby would include "concoctions" in banned_strings 30% of time time.

A better implementation might be to apply the probability at the sampler backfilling stage.

Explanation

A more delicate banned_strings or avoided_strings would allow string-level adjustment of probabilities. This would permit statistical analysis of model output vs reference output at a (word or phrase) string level. Then it would be possible to automate adjustment of output to target a corpus style by adjusting those probabilities that are most mis-represented. Minimimising a slop-norm, as it were.

Examples

I think this is the approach that antislop uses.

Additional context

Moved from tabbyAPI.

I know antislop has been discussed, but probability-reducing at the string level has not been suggested here, I think.

Acknowledgements

[X] I have looked for similar requests before submitting this one.
[X] I understand that the developers have lives and my issue will be answered when possible.
[X] I understand the developers of this program are human, and I will make my requests politely.

turboderp commented 1 month ago

If this is for modifying the frequencies of individual words, why not use token bias?

Not that you couldn't attach a probability and simply ignore matching strings some percentage of the time. It would add complexity, though, especially if you want an individual probability for each string, since currently the implementation doesn't really track which string it's reacting to at any given time.

Overall, though, this kind of second-guessing kinda misses the point of LLMs. Whatever happened to finetuning and DPO?

atisharma commented 1 month ago

If this is for modifying the frequencies of individual words, why not use token bias?

I may have misunderstood how token bias works, but as I understand it, changing the logit_bias is a per-token operation whereas this proposal and banned_strings operate on a series of tokens and backtracks. For example, "barely above a whisper" is greatly over-represented, but "barely" and "whisper" (and their tokens) are not over-represented and should not be penalised. There's no way to do that penalty on an individual token basis.

Overall, though, this kind of second-guessing kinda misses the point of LLMs. Whatever happened to finetuning and DPO?

I think this is a fair point and it's reasonable if you'd rather not try and fix problems originating upstream. I am currently generating a slop-free dataset to that end. However, given the speed and popularity of synthetic datasets (especially those generated from Anthropic or OpenAI) it's inevitable that most datasets are going to be sloppy. A string-level biasing would be useful in both using such models and in generating better synthetic data.

I feel like the need for this has already been accepted somewhat with the implementation of banned_strings, unless that was intended for outright offensive words.

turboderp commented 1 month ago

There's a couple of motivations for banned_strings:

Outright blocking certain phrases, as a safeguard. This could be offensive words or sensitive information, or maybe chunks of the system prompt if you want to prevent it from leaking.
Steering the model away from refusal phrases like "as an AI language model" or "I cannot provide." (Some models are easily de-censored this way.)
Avoiding sloppy phrases in creative writing

For the latter purpose I guess it could make sense to attach a probability to allow the banned phrases to pass the filter sometimes.. But the main issue there is that you're going to end up with a long list of things you think the model should say less often, then trying to gauge how likely the model is to say it to begin with and adjusting that likeliness accordingly.

To say precisely what the probability should be you'd need a lot of testing. Likely something you'd want to automate, but if you follow that idea long enough, I do believe what you'll arrive at is just a worse version of finetuning.

There's also the issue that, although there's very low overhead for this, it does add up the more phrases you include. Every time the generator encounters a string that might be the beginning of a banned string, output is held until the phrase is resolved as either matching or not matching something from the ban list, So if you add the string "barely above a whisper", any occurrence of the word "barely" is going to cause a stutter, giving you no output for one iteration and then two tokens at once if it turns out the model was generating "barely even" or some other phrase that passes the test. This is barely noticeable when the list is short, but if the list grows large enough it could start to feel very choppy in streaming applications, especially if a lot of phrases are added with a low probability because they just need to be slightly less likely.

And of course, whenever you do end up rewinding, you've wasted some iterations decoding down a dead end path. So a very long list of phrases could slow everything down a lot.

If I try to imagine what the "ideal" version of this would look like, for eliminating slop at least, what I picture is training a separate language model to identify slop, then using that to monitor the output of the primary model to rewind it whenever it becomes too sloppy. But then the question becomes, why not just merge those two models into one? And how would that be different from finetuning/reinforcement learning/DPO?

That said if you want to experiment you're of course welcome. On line 2084 of generator.py:

            if match >= 0:
                set_checkpoint()
                offending_tokens, offending_text = rewind_checkpoint()
                return emit(results, emit_held = True, suppressed_text = offending_text, suppressed_tokens = offending_tokens)

That's where a match is realized. The generator won't be aware of which banned string was encountered, but you could add a constant probability, like:

            if match >= 0 and random.random() < 0.75:  # randomly ignore 25% of hits
                ...

To do a little more with it, match, if non-negative, is the index into self.held_text where the match was found so you could add some logic there to check against a predefined list and adjust the dropout rate accordingly.

atisharma commented 1 month ago

Thank you for your thoughtful reply. I think you make a good argument. I will continue down the route of filtering the finetuning dataset for now. The code change you describe is straightforward, so it's good to have it written down here.

turboderp / exllamav2