guidance-ai / guidance

A guidance language for controlling large language models.
MIT License
18.87k stars 1.04k forks source link

Default behavior of `json` generation is likely more verbose than users expect #1050

Open hudson-ai opened 2 hours ago

hudson-ai commented 2 hours ago

We currently respect JSON Schema's semantics around the additionalProperties keyword; i.e. leaving it unset is interpreted as "any property not specified by the properties keyword (1) is allowed and (2) has no restrictions on its value (other than being valid JSON)".

These semantics are useful (see https://github.com/guidance-ai/guidance/issues/887), and I think we should continue respecting them.

That being said, I also think that the majority of our users will expect the LLM to only produce the properties that were explicitly requested. Anything "extra" costs the user time and money, and it will likely be thrown away.

I recommend the following solution:

  1. Recommend that all users use the pydantic interface (passing a BaseModel subclass to the json generation function) UNLESS they want the extra fine-grained control that directly passing a JSON object provides.
  2. Write a custom BaseModel.model_json_schema implementation that sets additionalProperties to False and use it to convert the BaseModel to a JSON schema we'll generate against.

I feel far more comfortable being "opinionated" when our users use the high-level pydantic interface rather than the low-level JSON interface.

Additional notes:

hudson-ai commented 2 hours ago

@JC1DA @Harsha-Nori ping for visibility