pydantic / pydantic

Data validation using Python type hints
https://docs.pydantic.dev
MIT License
20.53k stars 1.84k forks source link

Support efficient validation of ranges as sequences #9916

Open adriamontoto opened 2 months ago

adriamontoto commented 2 months ago

Initial Checks

Description

When a large range is passed to a function with a Sequence annotation using the validate_call decorator from Pydantic, the system encounters issues due to the validation process. Despite ranges being lazy sequences that do not require memory allocation for all elements, the validation process seems to handle it inefficiently, leading to crashes with extremely large ranges.

It happens with from collections.abc import Sequence and from typing import Sequence.

image

Example Code

from collections.abc import Sequence
from random import SystemRandom
from typing import Annotated, TypeVar

from pydantic import Field, validate_call

T = TypeVar('T')

@validate_call
def random_choice(sequence: Annotated[Sequence[T], Field(min_length=1)]) -> T:
    return SystemRandom().choice(seq=sequence)

print(isinstance(range(1, 2**31), Sequence))  # >>> True
print(random_choice(range(1, 2**31)))

Python, Pydantic & OS Version

Windows Machine

pydantic version: 2.8.2
pydantic-core version: 2.20.1
pydantic-core build: profile=release pgo=true
install path:
python version: 3.11.7 (tags/v3.11.7:fa7a6f2, Dec  4 2023, 19:24:49) [MSC v.1937 64 bit (AMD64)]
platform: Windows-10-10.0.22631-SP0
related packages: mypy-1.10.1 typing_extensions-4.12.2
commit: unknown

Linux Machine

pydantic version: 2.8.2
pydantic-core version: 2.20.1
pydantic-core build: profile=release pgo=true
install path:
python version: 3.11.9 (main, Apr  6 2024, 17:59:24) [GCC 11.4.0]
platform: Linux-5.15.153.1-microsoft-standard-WSL2-x86_64-with-glibc2.35
related packages: typing_extensions-4.12.0
commit: unknown
sydney-runkle commented 2 months ago

Yeah we could probably handle this case better, bc a range is a valid sequence, and there's probably a clever way to validate the contents of the range without iterating through each item.

sydney-runkle commented 2 months ago

I think really this is more a task of "support efficient validation of ranges as sequences"

adriamontoto commented 2 months ago

@sydney-runkle Do you want me to change the issue title?