guidance-ai / guidance

A guidance language for controlling large language models.
MIT License
19.05k stars 1.04k forks source link

How to ensure that the output is a string formatted as a list representation #604

Open tungngthanh opened 9 months ago

tungngthanh commented 9 months ago

I'm trying to create a string in the format of a list, like "[element1, element2, ...]". But when I use gen(regex=r'[.+]', name='format_list'), it doesn't generate the list as expected and keeps running without producing the desired output. Could you provide the instructions on how to correctly generate a string in this list format?

slundberg commented 9 months ago

A full example is needed before I be sure what you mean. But one issue that happens with the regex you have written that it can match anything inside the brackets, even things that look nothing like a list. So you get things like this:

image

If you want to actually stop when you see a close bracket you need to give a more specific grammar (probably more than a regex, or just set "]" as the stop token:

image

here is an example of using a more explicit grammar:

image
tungngthanh commented 9 months ago

Hi @slundberg, I appreciate your guidance. The solution using regex r'[.+',stop=']' worked well for my scenario. I have an additional question and would be grateful for your assistance. In my project, which involves multi-label classification, I need to assign each element from a generated list to multiple applicable labels. How can I achieve this? Currently, I'm only aware of the selectfunction, which seems to return just a single class. Could you advise on how to handle multiple labels for each element?

doomgrave commented 9 months ago

Just make it generate 5 elements in a for loop with gen+ list_append=true. Then parse the list the way you want

doomgrave commented 9 months ago

Hi @slundberg, I appreciate your guidance. The solution using regex r'[.+',stop=']' worked well for my scenario. I have an additional question and would be grateful for your assistance. In my project, which involves multi-label classification, I need to assign each element from a generated list to multiple applicable labels. How can I achieve this? Currently, I'm only aware of the selectfunction, which seems to return just a single class. Could you advise on how to handle multiple labels for each element?

For this, ive found best results evaluating each tag one by one if its appropriate or not.

If not you can use the one_or_more instead of the select.