What behavior of the library made you think about the improvement?
Currently when using outlines.generate, chat templates aren't applied by default. It's awkward and unintuitive to structure your prompts as chat templates. For example, a well structured input for a llama-3 model might look like
generator = outlines.generate.json(...)
my_prompt = """<|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\nProvide me JSON Data<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n"""
generator(my_prompt)
I'd prefer
generator("Provide me JSON Data")
Why We Should Apply Chat Templates by Default
Without the application of chat templates, the model emulates the continuation of a monologue. Where-as chat template format generally follows a query-response structure.
No Chat Template
>>> output = model.generate(**tokenizer("What is 1 + 1?", return_tensors="pt"), max_length=32)
>>> tokenizer.decode(output[0])
"<s> What is 1 + 1?\n\nThis question has been asked by many people, but I don't understand the answer.\n\nCould"
>>> output = model.generate(**tokenizer("Give me a random color:", return_tensors="pt"), max_length=32)
>>> tokenizer.decode(output[0])
'<s> Give me a random color:\n\n- Response: A random color can be represented in hexadecimal format as #RRGGBB,'
With Chat Template
output = model.generate(**tokenizer('<s><|user|> What is 1 + 1?<|end|><|assistant|>', return_tensors="pt"), max_length=32)
tokenizer.decode(output[0])
'<s><s><|user|> What is 1 + 1?<|end|><|assistant|> 1 + 1 equals 2. This is a basic arithmetic addition problem. When you'
>>> output = model.generate(**tokenizer('<s><|user|> Give me a random color:<|end|><|assistant|>', return_tensors="pt"), max_length=32)
>>> tokenizer.decode(output[0])
"<s><s><|user|> Give me a random color:<|end|><|assistant|> The random color I'll describe for you is a vibrant shade of teal, with"
How would you like it to behave?
By default generator(prompt) applies the chat template.
Current behavior should remain available via generator(prompt, raw=True)
Alternatively it might make sense to have the raw argument in the generator constructing function (e.g. outlines.generate.text(model, raw=True)
Related: https://github.com/outlines-dev/outlines/issues/756
What behavior of the library made you think about the improvement?
Currently when using
outlines.generate
, chat templates aren't applied by default. It's awkward and unintuitive to structure your prompts as chat templates. For example, a well structured input for a llama-3 model might look likeI'd prefer
Why We Should Apply Chat Templates by Default
Without the application of chat templates, the model emulates the continuation of a monologue. Where-as chat template format generally follows a query-response structure.
No Chat Template
With Chat Template
How would you like it to behave?
By default
generator(prompt)
applies the chat template.Current behavior should remain available via
generator(prompt, raw=True)
Alternatively it might make sense to have the raw argument in the generator constructing function (e.g.
outlines.generate.text(model, raw=True)