langchain-ai / langchain-google

MIT License
105 stars 121 forks source link

vertexai: converting pydantic to vertex function breaks with allOf #95

Closed baskaryan closed 1 month ago

baskaryan commented 6 months ago

pydantic json schema will include an allOf if a nested object has a description. for example

from langchain_core.pydantic_v1 import BaseModel, Field

class Node(BaseModel):
    id: str
    type: str

class Relationship(BaseModel):
    source: Node
    target: Node = Field(..., description="foo")

Relationship.schema()

generates

{
  'title': 'Relationship',
  'type': 'object',
  'properties': {
    'source': {'$ref': '#/definitions/Node'},
    'target': {
      'title': 'Target',
      'description': 'foo',
      'allOf': [{'$ref': '#/definitions/Node'}]
    }
  },
  'required': ['source', 'target'],
  'definitions': {
    'Node': {
      'title': 'Node',
      'type': 'object',
      'properties': {
        'id': {'title': 'Id', 'type': 'string'},
        'type': {'title': 'Type', 'type': 'string'}
      },
     'required': ['id', 'type']
    }
  }
}

notice difference between 'source' and 'target'.

When trying to use function-calling, the allOf leads to a protobuf error

File ~/anaconda3/lib/python3.11/site-packages/proto/marshal/rules/message.py:36, in MessageRule.to_proto(self, value)
     31 if isinstance(value, dict) and not self.is_map:
     32     # We need to use the wrapper's marshaling to handle
     33     # potentially problematic nested messages.
     34     try:
     35         # Try the fast path first.
---> 36         return self._descriptor(**value)
     37     except TypeError as ex:
     38         # If we have a type error,
     39         # try the slow path in case the error
     40         # was an int64/string issue
     41         return self._wrapper(value)._pb

ValueError: Protocol message Schema has no "allOf" field.

using

protobuf==4.25.3
google-cloud-aiplatform==1.44.0
lkuligin commented 6 months ago

@alx13

alx13 commented 6 months ago

Yep. Unfortunately it's an issue with underlying library:

https://github.com/googleapis/python-aiplatform/blob/cdb8e6afc3791ca5b3c86e516dde2c3f111401f0/google/cloud/aiplatform_v1beta1/types/openapi.py#L63

Pydantic models need to be constructed in a very specific way to avoid unsupported properties with Schema.

Let me check how model can be constructed.

alx13 commented 6 months ago

It's related with: https://github.com/pydantic/pydantic/issues/3896

And this generates a schema without allOf:

from typings import Annotated
from langchain_core.pydantic_v1 import BaseModel, Field

class Node(BaseModel):
    id: str
    type: str

class Relationship(BaseModel):
    source: Node
    target: Node = Annotated[Node, "foo"]

Relationship.schema()
baskaryan commented 6 months ago

It's related with: pydantic/pydantic#3896

And this generates a schema without allOf:

from typings import Annotated
from langchain_core.pydantic_v1 import BaseModel, Field

class Node(BaseModel):
    id: str
    type: str

class Relationship(BaseModel):
    source: Node
    target: Node = Annotated[Node, "foo"]

Relationship.schema()

what schema does that generate?

alx13 commented 6 months ago
{
  "title": "Relationship",
  "type": "object",
  "properties": {
    "source": {
      "title": "Node",
      "type": "object",
      "properties": {
        "id": {
          "title": "Id",
          "type": "string"
        },
        "type": {
          "title": "Type",
          "type": "string"
        }
      },
      "required": [
        "id",
        "type"
      ]
    },
    "target": {
      "title": "Node",
      "type": "object",
      "properties": {
        "id": {
          "title": "Id",
          "type": "string"
        },
        "type": {
          "title": "Type",
          "type": "string"
        }
      },
      "required": [
        "id",
        "type"
      ]
    }
  },
  "required": [
    "source",
    "target"
  ]
}
baskaryan commented 6 months ago

so the description is ignored altogether?

alx13 commented 6 months ago

Yep you are right, missed that.

And my example was incorrect.

So currently you can use that approach:

class Node(BaseModel):
    """foo"""
    id: str
    type: str

class Relationship(BaseModel):
    source: Node
    target: Node

or if class will be used for different fields:

class Node(BaseModel):
    id: str
    type: str

class NodeTarget(Node):
    """foo"""

class Relationship(BaseModel):
    source: Node
    target: NodeTarget

which produces:

{
  "title": "Relationship",
  "type": "object",
  "properties": {
    "source": {
      "title": "Node",
      "type": "object",
      "properties": {
        "id": {
          "title": "Id",
          "type": "string"
        },
        "type": {
          "title": "Type",
          "type": "string"
        }
      },
      "required": [
        "id",
        "type"
      ]
    },
    "target": {
      "title": "NodeTarget",
      "description": "foo",
      "type": "object",
      "properties": {
        "id": {
          "title": "Id",
          "type": "string"
        },
        "type": {
          "title": "Type",
          "type": "string"
        }
      },
      "required": [
        "id",
        "type"
      ]
    }
  },
  "required": [
    "source",
    "target"
  ]
}