outlines-dev / outlines

Structured Text Generation
https://outlines-dev.github.io/outlines/
Apache License 2.0
6.94k stars 357 forks source link

Format `prompts` Using Chat Templates in `SequenceGeneratorAdapter` #987

Open lapp0 opened 1 week ago

lapp0 commented 1 week ago

Related: https://github.com/outlines-dev/outlines/issues/756

What behavior of the library made you think about the improvement?

Currently when using outlines.generate, chat templates aren't applied by default. It's awkward and unintuitive to structure your prompts as chat templates. For example, a well structured input for a llama-3 model might look like

generator = outlines.generate.json(...)

my_prompt = """<|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\nProvide me JSON Data<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n"""
generator(my_prompt)

I'd prefer

generator("Provide me JSON Data")

Why We Should Apply Chat Templates by Default

Without the application of chat templates, the model emulates the continuation of a monologue. Where-as chat template format generally follows a query-response structure.

No Chat Template
>>> output = model.generate(**tokenizer("What is 1 + 1?", return_tensors="pt"), max_length=32)                                                                                                                    
>>> tokenizer.decode(output[0])
"<s> What is 1 + 1?\n\nThis question has been asked by many people, but I don't understand the answer.\n\nCould"
>>> output = model.generate(**tokenizer("Give me a random color:", return_tensors="pt"), max_length=32)
>>> tokenizer.decode(output[0])
'<s> Give me a random color:\n\n- Response: A random color can be represented in hexadecimal format as #RRGGBB,'
With Chat Template
output = model.generate(**tokenizer('<s><|user|> What is 1 + 1?<|end|><|assistant|>', return_tensors="pt"), max_length=32)
tokenizer.decode(output[0])
'<s><s><|user|> What is 1 + 1?<|end|><|assistant|> 1 + 1 equals 2. This is a basic arithmetic addition problem. When you'
>>> output = model.generate(**tokenizer('<s><|user|> Give me a random color:<|end|><|assistant|>', return_tensors="pt"), max_length=32)
>>> tokenizer.decode(output[0])
"<s><s><|user|> Give me a random color:<|end|><|assistant|> The random color I'll describe for you is a vibrant shade of teal, with"

How would you like it to behave?

By default generator(prompt) applies the chat template.

Current behavior should remain available via generator(prompt, raw=True)

Alternatively it might make sense to have the raw argument in the generator constructing function (e.g. outlines.generate.text(model, raw=True)