sgl-project / sglang

SGLang is a fast serving framework for large language models and vision language models.
https://sglang.readthedocs.io/en/latest/
Apache License 2.0
5.27k stars 374 forks source link

[Feature] Repeated generation expression #1175

Open laurens-gs opened 1 month ago

laurens-gs commented 1 month ago

Checklist

Motivation

The documentation shows a nice example on how to split of two paths with a fork to reason about each point separately, then to gather the reasonings and combine in a summary:

@sgl.function
def tip_suggestion(s):
    s += (
        "Here are two tips for staying healthy: "
        "1. Balanced Diet. 2. Regular Exercise.\n\n"
    )

    forks = s.fork(2)
    for i, f in enumerate(forks):
        f += f"Now, expand tip {i+1} into a paragraph:\n"
        f += sgl.gen(f"detailed_tip", max_tokens=256, stop="\n\n")

    s += "Tip 1:" + forks[0]["detailed_tip"] + "\n"
    s += "Tip 2:" + forks[1]["detailed_tip"] + "\n"
    s += "In summary" + sgl.gen("summary")

Using this scenario, I think it would be beneficial to let the LLM generate the tips too using some repeater of sorts. So I imagine a hypothetical scenario like this:

@sgl.function
def tip_suggestion(s):
    s += "Here is a list of tips to stay healthy: "
    s += sgl.repeat("tips", max_len=10, sep="\n", stop="\n\n")

    forks = s.fork(len(s['tips']))
    for i, f in enumerate(forks):
        f += f"Let's zoom in on this tip:\n\t"
        f += f['tips'][i] + "\n\n"
        f += "Here is a more detailed description on how to follow the tip:\n\n"
        f += sgl.gen(f"detailed_tip", max_tokens=256, stop="\n\n")
        f += "In hindsight, do you think this is a good tip? Answer either yes or no:\n"
        f += sgl.gen("good", choices=["yes", "no"])

    s += "Here is a detailed explanation for each of the the good tips:"
    for i in range(len(s['tips'])_:
        if fork[i]["good"] == "yes":
           s += s['tips'][i] + ":\n"
           s += forks[i]['detailed_tip'] + "\n\n"

    s += "In summary"+ sgl.gen("summary")

Do you think language feature like this would belong to the sglang project? I personally think this is quite a natural extension to what is already provided. Right now, we can already expand and reason in a static scenario. But in real world tasks things are rarely static like that. With this increased flexibility, we can parameterize the topic for which we want tips. Now we ask for tips to stay healthy, but next time we might ask for tips to become rich quick. Since we don't know how many tips the LLM has in store for us, we need this kind of dynamism. So with this language extension, it would be possible to apply proven prompting techniques such as self critique and tree-of-thoughts in a wider range of scenarios.

laurens-gs commented 1 month ago

After giving this some thought, a sgl.repeat() as envisioned above is conceptually the same as just sending out a sgl.gen() and then calling str.split() on the result. This still needs to be implemented at the language level because the variable returned from sgl.gen() during tracing is not an actual string but a promise of a string.

Some nice-to-haves to this would be the ability to detect and capture (un)numbered lists when the LLM produces such lists. This saves the trouble of stripping whitespace, bullet points or numbers at the start of lines.