[Breaking] change the behavior of ChatVertexAI.with_structured_output when specifying a dict

langchain-ai / langchain-google

MIT License

74 stars 78 forks source link

[Breaking] change the behavior of ChatVertexAI.with_structured_output when specifying a dict #333

Open kiarina opened 3 days ago

kiarina commented 3 days ago

The behavior of ChatVertexAI.with_structured_output when specifying a dict was different from ChatOpenAI and ChatAnthropicVertex, so it has been corrected.

Modified to accept not only the ["name", "description", "parameters"] format for the dict but also ["title", "description", "properties"].
When a dict is specified as the schema, it now extracts properties from the output.

This is considered breaking as it changes existing functionality. If this change is unacceptable, please feel free to close it.

lkuligin commented 3 days ago

asking @efriis for the second opinion :), but the change looks reasonable to me. thanks for the PR!

baskaryan commented 3 days ago

I think this will also break cases where a protobuf is passed in, which might be a useful feature (to match what vertex api provides). Will take a closer look when I'm back from vacation next week, and @baskaryan might have some takes before then!

The common abstraction currently includes pydantic BaseModel

The question seems to be whether we want to have the common abstraction also include:

Whatever the provider API supports by default

JSON Schema (OpenAI/Anthropic)

In general I'm more supportive of 1 (and closing this), and @baskaryan may have different input!

re: breaking support for protobuf schemas, agree we should avoid this. could we either explicitly check the type of the schema before calling convert_to_openai_function, or else try/except that conversion. and when the schema cannot be converted to openai format we keep current behavior

kiarina commented 2 days ago

re: breaking support for protobuf schemas, agree we should avoid this. could we either explicitly check the type of the schema before calling convert_to_openai_function, or else try/except that conversion. and when the schema cannot be converted to openai format we keep current behavior

        elif isinstance(schema, dict) and all(
            k in schema for k in ("title", "description", "properties")
        ):
            schema = convert_to_openai_function(schema)
            parser = JsonOutputKeyToolsParser(
                key_name=schema["name"], first_tool_only=True
            )
        else:
            parser = JsonOutputToolsParser()

I've made it so that the behavior changes only when a dict containing title, description, and properties is passed. How does this look?