pydantic / pydantic

Data validation using Python type hints
https://docs.pydantic.dev
MIT License
20.8k stars 1.86k forks source link

`Url` "subclasses" don't enforce constraints #7353

Open sydney-runkle opened 1 year ago

sydney-runkle commented 1 year ago

Initial Checks

Description

Currently they don't since they're actually just using Annotated.

Solution is to switch:

what we currently have:

HttpUrl = Annotated[Url, UrlConstraints(max_length=2083, allowed_schemes=['http', 'https'])]

should be more like:

class HttpUrl(Url):
    constraints = UrlConstraints(max_length=2083, allowed_schemes=['http', 'https'])

Then we need a clever implementation of __init_subclass__ or similar, in this bit of Rust.

Same goes for MultiHostUrl.

from pydantic import HttpUrl, TypeAdapter, ValidationError

print(HttpUrl('ssh://www.google.com'))

try:
    HttpUrl('invalid')
except ValidationError as e:
    print(e)
else:
    raise RuntimeError('should have raised ValidationError')

ta = TypeAdapter(HttpUrl)
try:
    ta.validate_python('ssh://www.google.com')
except ValidationError as e:
    print(e)
else:
    raise RuntimeError('should have raised ValidationError')

Example Code

No response

Python, Pydantic & OS Version

pydantic version: 2.3.0
        pydantic-core version: 2.7.0
          pydantic-core build: profile=release pgo=false
               python version: 3.10.12 (main, Jul  5 2023, 15:34:07) [Clang 14.0.6 ]
                     platform: macOS-10.16-x86_64-i386-64bit
     optional deps. installed: ['devtools', 'email-validator', 'typing-extensions']
sydney-runkle commented 1 year ago

Looks like the ability to call HttpUrl('ssh://www.google.com') wasn't available in V1. Trying to do so produces the following error:

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "pydantic/networks.py", line 187, in pydantic.networks.AnyUrl.__init__
TypeError: __init__() needs keyword-only argument scheme

I spoke with @adriangb, and it does seem as though this would be difficult to implement. If we implement some sort of clever __init_subclass__ on the rust end of things, we're left with a Python class in the rust code.

Is there any particular reason why we want to be able to avoid the use of the type annotation in favor of a direct constructor call @samuelcolvin?

If we want to approach a solution solely from the pydantic side of things, we could implement something like this:

from typing import Annotated
from pydantic_core import Url
from pydantic import TypeAdapter
from pydantic.networks import UrlConstraints

class MyUrl(Url):
    url_constraints = UrlConstraints(allowed_schemes=['https'])

    def __new__(cls, url: str) -> 'MyUrl':
        return TypeAdapter(Annotated[Url, getattr(cls, 'url_constraints')]).validate_python(url)

However, this is flawed in that the return type is actually Url. Not to mention, obviously inefficient.

Excited to hear other thoughts on this.

samuelcolvin commented 1 year ago

Is there any particular reason why we want to be able to avoid the use of the type annotation in favor of a direct constructor call @samuelcolvin?

Well the simple version of doing that doesn't work.

We could create our own type and trick typing into thinking it's Annotated, but I'm not sure that's worth the extra complexity.

adriangb commented 1 year ago

Well the simple version of doing that doesn't work.

The simple version of using TypeAdapter doesn't work?

We could create our own type and trick typing into thinking it's Annotated, but I'm not sure that's worth the extra complexity.

Right given that this didn't work in v1 and that no one (other than us, internally) has wanted it in v2 the key here is going to be avoiding a lot of complexity if the same result can be achieved with a couple extra LOC (I was thinking using TypeAdapter but maybe I'm missing why that doesn't work as per above)

sydney-runkle commented 1 year ago

@samuelcolvin, would you be ok with closing this issue for now given Adrian's comment ^^?

haoyun commented 1 year ago

I was trying to implement something that making the password part of an url a SecretStr (pydantic/pydantic#3200), then accidently I came to this issue too, when I tried AnyHttpUrl("hello://world"). The docstring (editors show it) of URL.__new__() from _pydantic_core.pyi suggests me to do so....

giacomorebecchi commented 10 months ago

Hi there, I'm commenting here to share my experience: I was trying to obtain an equivalent of pydantic.AnyUrl that can accept having no host, and I did:

from typing import Annotated

from pydantic import AfterValidator, AnyUrl, UrlConstraints

GenericUrl = Annotated[
    AnyUrl,
    UrlConstraints(host_required=False),
]

now, I expected that calling GenericUrl.build(**kwargs) could accept no host.

I understand that the annotation will not change the class methods, but this is just an example to prove that the current interface is not particularly "developer-friendly".

Also, the fact that you cannot set attributes (e.g. I was trying to set the password with a new value) can be perceived as a limit.

If you happen to have suggestions on how to work around the two cases I mentioned, I would be very grateful. Thanks for the great work!

psychedelicious commented 4 months ago

The fudged Url "subclasses" push a pydantic user towards improper usage of Annotated: image

According to the typing spec, an Annotated shouldn't be instantiated. See the last point in: https://typing.readthedocs.io/en/latest/spec/qualifiers.html#id4

Here's a pyright issue w/ comment from the maintainer: https://github.com/microsoft/pyright/issues/7370#issuecomment-1971493515

Making these proper classes would resolve this issue. Maybe #9147 has the same root cause.

FWIW, pyright does correctly infer the signature of AnyHttpUrl's constructor: image

adriangb commented 4 months ago

@psychedelicious yes that was a new breaking change introduced into the spec. It used to be allowed until like 2 months ago.

I still agree that making things real classes and dealing with the consequences is probably the best approach.

skeletorXVI commented 3 months ago

Another problem that could be solved by moving to dedicated classes is actually extending the Url class with additional functionality.

Currently regardless of what class defines the url_schema, Pydantic returns an pydantic_core.Url instance, as a result one has to move either Url validation to python by building a wrapper type or lose the UrlConstraints functionality for a reusable custom Url class.

An example where I ran into this issue is adding a __truediv__ function to the Url class, to implement a similar experience as pathlib.Path. My workaround is first apply the UrlConstraints and the run an AfterValidator that rebuilds the url with my url class.

My idea would allow something like this:

Note that I am not familiar with the Python-Rust translation, so I am not too sure how this could be implemented

from typing import Any
from pydantic import GetCoreSchemaHandler
from pydantic_core import Url

class ExampleUrl(Url):
    @classmethod
    def __get_pydantic_core_schema__(
        cls, source: Any, handler: GetCoreSchemaHandler
    ) -> CoreSchema:
        return UrlSchema(url_class=cls, allowed_schemes=["example"])  # Specify while python class to use

    def infer_magic_link() -> str:
        ...

# Usage

class Config(BaseSettings):
    example_endpoint: ExampleUrl
    example_staging_endpoint: Annotated[ExampleUrl, UrlConstraints(default_port=2345)]

config = Config()
print(config.example_staging_endpoint.infer_magic_link())