Open laurens-gs opened 1 month ago
After giving this some thought, a sgl.repeat()
as envisioned above is conceptually the same as just sending out a sgl.gen()
and then calling str.split()
on the result. This still needs to be implemented at the language level because the variable returned from sgl.gen()
during tracing is not an actual string but a promise of a string.
Some nice-to-haves to this would be the ability to detect and capture (un)numbered lists when the LLM produces such lists. This saves the trouble of stripping whitespace, bullet points or numbers at the start of lines.
Checklist
Motivation
The documentation shows a nice example on how to split of two paths with a fork to reason about each point separately, then to gather the reasonings and combine in a summary:
Using this scenario, I think it would be beneficial to let the LLM generate the tips too using some repeater of sorts. So I imagine a hypothetical scenario like this:
Do you think language feature like this would belong to the sglang project? I personally think this is quite a natural extension to what is already provided. Right now, we can already expand and reason in a static scenario. But in real world tasks things are rarely static like that. With this increased flexibility, we can parameterize the topic for which we want tips. Now we ask for tips to stay healthy, but next time we might ask for tips to become rich quick. Since we don't know how many tips the LLM has in store for us, we need this kind of dynamism. So with this language extension, it would be possible to apply proven prompting techniques such as self critique and tree-of-thoughts in a wider range of scenarios.