pydantic / pydantic

Data validation using Python type hints
https://docs.pydantic.dev
MIT License
20.46k stars 1.83k forks source link

Add support for `Protocol` type #10161

Open Ryang20718 opened 3 weeks ago

Ryang20718 commented 3 weeks ago

Initial Checks

Description

gin/_schema_validator.py", line 50, in create_schema_validator return SchemaValidator(schema, config) pydantic_core._pydantic_core.SchemaError: Error building "dataclass" validator: SchemaError: Error building "dataclass-args" validator: SchemaError: Field 'y': SchemaError: Error building "is-instance" validator: SchemaError: 'cls' must be valid as the first argument to 'isinstance'

Attempting to use base class of type Protocol leads to errors in pydantic

Example Code

from pydantic.dataclasses import dataclass as dataclass_with_validation
from pydantic import ConfigDict
from typing import Protocol

class Proto_(Protocol):
    t: int

@dataclass_with_validation(config=ConfigDict(arbitrary_types_allowed=True))
class Repro:
    y: Proto_

y = Proto_(t=3)
print(Repro=y)

Python, Pydantic & OS Version

pydantic version: 2.8.2
        pydantic-core version: 2.20.1
          pydantic-core build: profile=release pgo=true
                 install path: /home/ryang/.cache/bazel/_bazel_ryang/7071f3e24af240fa9a28f67dc16192b3/external/pip_pydantic/site-packages/pydantic
               python version: 3.10.7 (main, Sep  9 2022, 04:02:34) [GCC 9.4.0]
                     platform: Linux-5.15.0-113-generic-x86_64-with-glibc2.31
             related packages: mypy-1.10.0 typing_extensions-4.12.2
                       commit: unknown
sydney-runkle commented 3 weeks ago

At a first glance, I don't know if it makes sense to add support for this - we build isinstance validators for unknown types, and isinstance doesn't apply to protocols.

Maybe this makes sense as a feature request, though I think that would just involve supporting the Protocol type, rather than extending arbitrary type checks to support protocols.

Ryang20718 commented 3 weeks ago

@sydney-runkle do we need to add the Protocol type to https://github.com/pydantic/pydantic-extra-types/blob/main/pydantic_extra_types?

sydney-runkle commented 3 weeks ago

@Ryang20718, we could start with adding it there, yes! I could also see an argument for eventually adding support to pydantic-core.

mpkocher commented 2 weeks ago

Provided you're using runtime_checkable, Pydantic will work as expected.

from pydantic import validate_call
from pydantic import ConfigDict
from typing import Protocol, Sequence, runtime_checkable
from types import SimpleNamespace

@runtime_checkable
class NameAble(Protocol):
    name: str

@validate_call(config=ConfigDict(arbitrary_types_allowed=True))
def example(xs: Sequence[NameAble]) -> str:
    return ",".join(x.name for x in xs)

nx = list(map(lambda x: SimpleNamespace(name=f"Name={x}"), range(0, 5)))

print(example(nx)) # Name=0,Name=1,Name=2,Name=3,Name=4

Structural typing and Nominal Subtyping are different. The runtime_checkable is adding isinstance support. It's sort of "bending" Structural Typing to behave like an ABC.

Note, you can't create instances of Protocols. It's not really how they work.

n = NameAble("test")
....

TypeError: Protocols cannot be instantiated

@sydney-runkle From a project owner/management side, it's useful to clearly define what "supports Protocol" explicitly means in this ticket.

I believe this ticket might be more about improving the docs to clarify how Pydantic and Protocol work together and how to avoid friction points with Protocol. Perhaps also trying to make error messages a bit less cryptic would be useful to consider.

While Structural Typing is closer to how I think in Python, I'm finding Protocol and Python's take on Structural typing to be a bit thorny.

References

Viicos commented 2 weeks ago

If we were to fully support protocols according to the typing spec, that would mean with the following:

from typing import Protocol

from pydantic import BaseModel

class MyProto(Protocol):
    def some_meth(self) -> int: ...

class Model(BaseModel):
    proto: MyProto

Instantiating a Model would only work if a concrete implementation of MyProto is being passed in:

class ProtoImpl:
    def some_meth(self) -> int:
        return 1

Model(proto=ProtoImpl())  # OK

class BadProtoImpl:
    def some_meth(self) -> str:
        return "a"

Model(proto=BadProtoImpl())  # ValidationError

This raises the question of whether we want to support non-data protocols (i.e. only with methods as members). If so, validating the model from JSON data would not be possible.

On the other hand, if we want to support "data-only" protocols (this isn't specified, but I mean by that protocols without any methods as members), then the behavior is probably really similar to dataclass-like types.

I think going with data-only protocols would be the way to go.