noamgat / lm-format-enforcer

Enforce the output format (JSON Schema, Regex etc) of a language model
MIT License
1.01k stars 46 forks source link

Enforce attribute order #38

Closed tom-doerr closed 2 months ago

tom-doerr commented 7 months ago

Is there a way to force generating the attributes in a certain order? I want to generate something but the attributes depend on the previous output. Example:

class TweetGenFormat(BaseModel):
    tweet: str
    is_great_tweet: bool

The rating is often generated first.

jpeig commented 6 months ago

+1

noamgat commented 6 months ago

It is possible to extend JsonSchemaParser to achieve this. The way to do it would be to add a boolean parameter to the configuration object, and change the way possible_keys is resolved in the PARSING_KEY_OR_END state.

PRs welcome. I will leave this open for discussion / voting (@jpeig - please vote with an emoji response to the main message) and will implement it if there is enough demand.

jpeig commented 6 months ago

Hope the rocket suffices! @noamgat

elonen commented 6 months ago

+1 Here's another example where the order matters but for a different reason.

Generating thought_process and answer_confidence first forces the model to "think step by step" in a structured manner instead of jumping to a conclusion (+then making up excuses after a sub-par answer), and encourages it to admit doubts in final_answer if thought_process didn't inspire a high answer_confidence.

class FinalAnswer(pydantic.BaseModel):
    thought_process: AnswerFormatThoughts
    answer_confidence: AnswerConfidence
    final_answer: str

Is it possible to work around this current limitation by chaining several parser together?

meditans commented 2 months ago

I found this because my preferred inference server (tabbyAPI) was exposing a json_schema field, and it turned out it was calling this library under the hood. As others have mentioned @noamgat, this feature would be very useful for implementing chain-of-thought like queries. I voted with a rocket, but was curious how much interest is enough interest :joy:

meditans commented 2 months ago

In the meantime, @elonen, you could either chain several parser together (although in my case this does change the program I want to write), or try to steer the llm itself to generate the correct order, via prompt engineering/examples (this has been working good enough for me on llama3). Still, a proper solution would be very nice imho.

noamgat commented 2 months ago

Released in v0.10.1. Can you check if you can now solve the problem via Configuration Options?

noamgat commented 2 months ago

Should be supported now, please reopen if the issue is not solved.