Progressive execution of choice over large number of options

7flash commented 5 months ago

Hi! I am not sure I understand how outlines.choice works currently, and whether this feature technically feasable to be implemented, but here is my usecase:

I have a constrained dataset of 1000 exact answers that I want my LLM to choose from in response to any user prompt.

If it's possible, I want my LLM to proceed generating each next token with different list of positive logit_bias as corresponding to shrinking list of remaining options after each next token choice.

Example. Let's say I have a dataset of 4 answers only: Good, Bad, Very Good, Very Bad User Prompt is: How are you?

In first sampling step, I want sampler to choose one of 3 tokens corresponding to: good, bad, very Let's say first token was "very" then next token can be only either good or bad, and then generation completed.

rlouf commented 5 months ago

That's an interesting use case. You could build the string-based FSM really easily by hand, and you could probably write a regex for it but that could end up being quite complicated.

We could always write a function that takes a string-based FSM, turns it into a character-based FSM that is then compiled into the token-based FSM that is used when generating text.

7flash commented 5 months ago

Following testing script demonstrates that dynamically generated regexp actually works

import outlines
import time

model = outlines.models.transformers("mistralai/Mistral-7B-Instruct-v0.2")

prompt = """
Respond User Question.

Q: How are you?
"""

n = 10
moods = ["very", "slightly"]
states = ["good", "bad"]
regex_pattern = "|".join([f"My Mood is {mood} {state} ({i} points of {n})" for mood in moods for state in states for i in range(1, n)])

generator = outlines.generate.regex(
    model,
    regex_pattern,
)

start_time = time.time()  # Start time

answer = generator(prompt)

end_time = time.time()  # End time

print(answer)

print(f"Time taken for generation: {end_time - start_time} seconds")

outputs

My Mood is very good 1 points of 10
Time taken for generation: 49.05198383331299 seconds

but if I increase n=500 generation time is much longer

now let's say I want to optimize it by leveraging my knowledge that "very bad" mood can only belong to range of 1-5

so once sampler has generated "very bad" token I want to adjust my regexp by removing all options other than mentioning 1-5 points

you say it can be done by constructing custom FSM? can it have a handler which accepts current generation text and returns a new list of options for next token? that would be ideal, but again I'm not sure if it's possible? @rlouf

outlines-dev / outlines

Progressive execution of choice over large number of options #722