huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
134.31k stars 26.85k forks source link

bad_words_ids not working #17504

Closed Jack000 closed 1 year ago

Jack000 commented 2 years ago

Feature request

I'm using gpt2 for text generation with a word blacklist and noticed that some words on the blacklist were still being generated.

I found that even though the word ["badword"] would not be generated, it would still generate ["bad", "word"] in two tokens.

an example of this is [11908] and [7286, 1754]

this seems to be a different issue from the leading space issue and padding issue. I think I could get around it by adding the split tokens to the blacklist, but I can't seem to get the tokenizer to split the string to produce [7286, 1754]. Is there a way to get all possible permutations of a string to add to the blacklist?

Motivation

Without this feature bad_words_ids basically doesn't work most of the time

Your contribution

Not familiar with the tokenizer code unfortunately

Jack000 commented 2 years ago

I wrote a function to enumerate all possible permutations of " Badword", but it quickly blows up with hundreds of permutations like [" B","a","d","w","o","r","d"]. Limiting the token length works ok, but still doesn't prevent generation of variations like [" Bad","words"]

I think this overall approach just doesn't really work for preventing the generation of bad_words. Don't know if there's a better solution than generate + filter.

def get_bad_words_ids(tokenizer, bad_words, min_strlen=2):
    vocab_tokens = tokenizer.get_vocab()
    vocab = {}

    for token in vocab_tokens:
        vocab[tokenizer.convert_tokens_to_string([token])] = token

    results = []

    for bad_word in bad_words:
        confirmed_tokens = []
        possible_tokens = []
        for token in vocab:
            if bad_word == token:
                confirmed_tokens.append([token])
            elif bad_word.startswith(token):
                possible_tokens.append([token])
        while len(possible_tokens) > 0:
            new_possible_tokens = []
            for prefixes in possible_tokens:
                prefix = ''.join(prefixes)
                for token in vocab:
                    if len(token) < min_strlen:
                        continue
                    if bad_word == prefix + token:
                        found_prefix = prefixes.copy()
                        found_prefix.append(token)
                        confirmed_tokens.append(found_prefix)
                    elif bad_word.startswith(prefix + token):
                        found_prefix = prefixes.copy()
                        found_prefix.append(token)
                        new_possible_tokens.append(found_prefix)
            possible_tokens = new_possible_tokens
        results += confirmed_tokens

    ids = []
    for tokens in results:
        gtokens = []
        for token in tokens:
            gtokens.append(vocab[token])
        ids.append(tokenizer.convert_tokens_to_ids(gtokens))
    return ids
gante commented 2 years ago

Hey @Jack000 👋 It is not clear from your description -- have you tried using the tokenizer with the instructions given in the NoBadWordsLogitsProcessor docs?

["...in order to get the token ids of the words that should not appear in the generated text, use tokenizer(bad_words, add_prefix_space=True, add_special_tokens=False).input_ids."]

Jack000 commented 2 years ago

That's what I did. This will consistently tokenize [" Badword"] as [11908] but during inference the model will generate [7286, 1754] which is [" Bad", "word"]

as I mentioned above I wrote a function to enumerate all possible ways of combining tokens to form "Badword", but the problem is that it doesn't work for variations like "Badwords" and "Badwordo". Extending the permutations to include these variations results in thousands of permutations per bad_word and doesn't really scale.

gante commented 2 years ago

Okay, I think I got your issue :) When you add a word to bad_word_ids, you would like to have its sub-words and/or related words banned as well, correct?

There are a few things worth mentioning here:

  1. It is intentional that sub-words do NOT get banned. Think about the word "doctorate", which is very different from two of its subwords ("doctor" and "ate"). Banning a word doesn't imply banning the subwords in most scenarios, and our implementation has to be flexible in that regard.
  2. When a long word gets broken into more than a token, the first token has a prefix space and will be different from the corresponding token without the space. This is to avoid banning valid sequences that would contain the same characters. Example: if you ban "doctorate", "doctor ate" is a valid sequence. This is because the banned tokens will be " doctor" and "ate", not " doctor" and " ate" (notice the spaces).
  3. Banned tokens resulting from a long word are never considered in isolation. Example: if you ban "doctorate", you can still generate " doctor" and "ate" in isolation, "the doctor wants to dictate" is a valid sequence.
  4. I've tried running the "Badword" example you mentioned, and I do get two tokens (one for " Bad", the other for "word").

You can see an example for a few cases mentioned above here.

The solution for banning subwords is to explicitly add them to the list of bad_word_ids. @patrickvonplaten have you seen tools to generate sub-words and/or derived words from a list of candidate words?

Jack000 commented 2 years ago

ah the actual bad word I was trying to ban was [" Hitler"].

I do understand how the bad_words_ids feature works, but I guess my issue is that I don't want the word "Hitler" generated under any circumstances subwords or otherwise. As you can see I did implement a function to enumerate all possible ways tokens can be combined to form "Hitler" to add to bad_words_ids, but if I include "Hitlers" and other such variations the possible permutations will number in the thousands.

anyways, I don't see a simple solution to this but the function I wrote in addition to filtering afterwards works ok for now.

gante commented 2 years ago

I do understand how the bad_words_ids feature works

My apologies :D Better safe than sorry, in case there was some confusion about the intended behavior.

patrickvonplaten commented 2 years ago

@patil-suraj could you maybe also take a look here? Otherwise happy to dive deeper if necessary

patrickvonplaten commented 2 years ago

Sorry could I ping @ArthurZucker or @gante on this one maybe? :-)

ArthurZucker commented 2 years ago

Hey! I looked at the problem a bit, and as you mentioned, the permutations would be a bit too problematic.

We can probably work this out by rather banning a normalized string. Instead of checking if [Bad_id,Word_id] are generated, we can should convert the a string by deciding, normalize and remove the bad word. This is more efficient but might not have its place in the generate function, as the tokenizer is not available. But it probably makes sens to have a custom logit processor that needs to be initialized with the tokenizer. Let me ask around 🤗

github-actions[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.