Implementing the features of the TextStreamer into the pipeline

kevin-guimard-ext commented 7 months ago

Feature request

It should be possible to format the output of a transformers.pipeline.

streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)

llm = pipeline(
    task="text-generation",
    model=model,
    tokenizer=tokenizer,
    return_full_text=True,
    generation_config=generation_config,
    eos_token_id=tokenizer.eos_token_id,
    pad_token_id=tokenizer.eos_token_id,
    streamer=streamer
)

prompt = "[INST] ... [/INST]"

result = llm(prompt)

The problem is that the result variable contains the prompt and the special tokens.

Motivation

When you use a transformers.pipeline, you can use a TextStreamer object to skip the prompt and the special tokens. The problem is that the result is print on the standard output. I haven't found a way to have this result as the output of the pipeline. I tried to apply batch_decode from the tokenizer on the model output, but the parameter skip_special_tokens didn't work.

Your contribution

N/A

ArthurZucker commented 6 months ago

cc @gante

gante commented 6 months ago

Hi @kevin-guimard-ext 👋

On some models, sequences like [INST] are not tagged as special tokens. Since you're passing skip_special_tokens=True, I'm assuming that it the case (to confirm whether it is the case, check whether the output without streaming still contains these tokens).

We may be able to do it through the chat templates -- @Rocketknight1, is there a way to decode text in a way that we filter these role tokens when they are not a special token?

Rocketknight1 commented 6 months ago

Hm, no unfortunately - if they're not special tokens, then there's no way for the tokenizer to know that they're formatting tokens. The only way to tell that is to inspect the Jinja script and see how they're used, which isn't really something we can do automatically!

kevin-guimard-ext commented 6 months ago

Mmm, I think you missed the point. The TextStreamer is able to flag the special tokens and to skip them. The problem I'm raising here is that I'd like to have this output in the result variable (in order to process it afterwards like any pipeline working with LLMs).

gante commented 6 months ago

@kevin-guimard-ext result.generated_text should include the output you want, no? Would you be able to provide a concrete example, as well as the output you'd want?

kevin-guimard-ext commented 6 months ago

@gante Ok, here is an example:

result = llm(f"[INST]How many primary colors are there?[/INST]")
print(result[0]["generated_text"])

The result if the following:

[INST]How many primary colors are there?[/INST] There are three primary colors...

I want to automatically get rid of the "INST" tags, as explained in my request above. Tested with Mistral 7B.

gante commented 6 months ago

@kevin-guimard-ext Ah, I now understand the problem! When the prompt is passed as a string, we don't touch it -- as such, you'll see the exact same text in the output.

Have you tried using chat templates?

kevin-guimard-ext commented 6 months ago

@gante Yeah, I know that you return the output as is, but I was asking whether you could equip the pipeline with a feature to remove the initial prompt and the special tags from the output (the functionality you already implemented in the TextStreamer!). My request is as simple as that. And it would be more than useful for the developers, because currently I need to use a dirty (and non portable) piece of code to handle this problem:

result = result[0]["generated_text"]
return result[result.find("[/INST]") + 8:]

And in answer to your question, no, I haven't tried the chat templates.

gante commented 6 months ago

@kevin-guimard-ext have a go with the chat templates, it is designed precisely to handle strings and chat models :)

A solution might be added to filter out [INST] in your particular case, but it doesn't solve the issue for all chat models. In some chat models, the delimiters are not tagged as special tokens, so we can't remove them through the tokenizer. However, they are added as part of the chat template, and thus the chat template should be the one solution to rule them all!

kevin-guimard-ext commented 6 months ago

@gante From the chat template documentation, it seems that the component that deals with tokens is the tokenizer. Since I instantiated a tokenizer based on the same model, I had a try with the _batchdecode method. There is a way to skip special tokens, but not prompt... Anyway, it would be interesting to put these functionalities in the pipeline.

gante commented 6 months ago

One last suggestion :D You can see I'm keen on not adding the change into pipeline -- I see its value, but it would break backwards compatibility in other cases.

Can you try tokenizing and then decoding (with skip_special_tokens=True) the output of the pipeline?

kevin-guimard-ext commented 6 months ago

Yeah, that's precisely what I've done, it works with tokens, but it's not possible to skip the prompt.

Anyway, I don't see in what adding an option in the pipeline would break backwards compatibility. If this option is not present, then the behavior remains unchanged. You know, my need is not exotic, the intent is just to reproduce the behavior of the text streamer into the pipeline's output. The objective is to have a proper output, free of special characters and initial prompt, ready to be processed.

gante commented 6 months ago

If we add a flag for every additional feature that can be easily done as pre-/post-processing, then our interface would be clogged with flags and hard to navigate in the docs. If we add the behavior directly, it will change the output in some cases, and thus be backward compatible.

but it's not possible to skip the prompt.

Then you can tokenize, filter the first tokens using the tokenized prompt length, and decode to separate the behavior you want with respect to filtering of special tokens :)

kevin-guimard-ext commented 6 months ago

I'm not necessarily pushing for the creation of a specific flag, I think a component is missing in the pipeline to automatically process the output. It could take the form of a new parameter like this:

llm = pipeline(
 ...
    output_processor=OutputProcessor(skip_prompt=True, skip_special_tokens=True)
)

A pipeline is expected to process input data and output data, and processing things by hand as you suggest invalidate the principle of using a pipeline.

huggingface / transformers