pydantic / pydantic-core

Core validation logic for pydantic written in rust
MIT License
1.37k stars 229 forks source link

Inconsistent Behavior in Pattern Matching String #1367

Closed AvlWx2014 closed 1 month ago

AvlWx2014 commented 1 month ago

Issue Description

On Pydantic 2.8.2 with Pydantic-core 2.20.0 I've noticed some inconsistent behavior validating strings using a pattern where passing the pattern as a string allows invalid inputs to pass validation, while passing the pattern as a compiled Pattern object from re.compile exhibits the expected behavior and rejects the invalid input.

Example:

import re

from pydantic import BaseModel, Field

# Note: this regular expression is based on the lowercase RFC1123 subdomain name regular expression
# the Kubernetes project uses to validate the names of Secrets and other resources.
COMPILED = re.compile(r"[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*")
NOT_COMPILED = r"[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*"

class NotCompiled(BaseModel):
    name: str = Field(pattern=NOT_COMPILED)

class Compiled(BaseModel):
    name: str = Field(pattern=COMPILED)

model = NotCompiled.model_validate({"name": "ShouldntPass"})
print(repr(model))
# NotCompiled(name='ShouldntPass')
model = Compiled.model_validate({"name": "ShouldntPass"})
print(repr(model))  # unreachable, previous line raises ValidationError as expected

This can be reproduced by adding the following two test cases to tests/validators/test_string.py:

        (
            {'pattern': r'[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*'},
            'ShouldntPass',
            'ShouldntPass',
        ),
        (
            {'pattern': re.compile(r'[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*')},
            'ShouldntPass',
            Err(
                "String should match pattern '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*' [type=string_pattern_mismatch"
            ),
        ),

Environment Info

Here is some information on the environment where I noticed the issue:

$ pdm info --env
{
  "implementation_name": "cpython",
  "implementation_version": "3.9.18",
  "os_name": "posix",
  "platform_machine": "x86_64",
  "platform_release": "6.9.7-100.fc39.x86_64",
  "platform_system": "Linux",
  "platform_version": "#1 SMP PREEMPT_DYNAMIC Thu Jun 27 18:06:32 UTC 2024",
  "python_full_version": "3.9.18",
  "platform_python_implementation": "CPython",
  "python_version": "3.9",
  "sys_platform": "linux"
}
$ pdm show pydantic
Name:                  pydantic                                                                                                                                      
Latest version:        2.8.2                                                                                                                                         
Latest stable version: 2.8.2                                                                                                                                         
Installed version:     2.8.2                                                                                                                                         
Summary:               Data validation using Python type hints                                                                                                       
Requires Python:       >=3.8                                                                                                                                         
... # Truncated for brevity                                                                                                                                                        
$ pdm show pydantic-core
Name:                  pydantic_core                                               
Latest version:        2.20.1                                                      
Latest stable version: 2.20.1                                                      
Installed version:     2.20.1                                                      
Summary:               Core functionality for Pydantic validation and serialization
Requires Python:       >=3.8                                                       
... # Truncated for brevity                                                                                                                                                        
tinez commented 1 month ago

A minimal example to reproduce that would be:

import re
from pydantic import BaseModel, Field

class A(BaseModel):
    b: str = Field(pattern=r"[a-z]")
    c: str = Field(pattern=re.compile(r"[a-z]"))

x = A.model_validate({"b": "Abc", "c": "Abc"})

The validation process for b will succeed, but it will fail for c. This seems to happen because in the first case, pydantic_core chooses to use rusts's regex as the regex engine and in the second case it simply calls pythonic re.Pattern.match method. The rust Regex.is_match implementation however, behaves more like python's re.search, in the docs we can find (emphasis mine):

Returns true if and only if there is a match for the regex anywhere in the haystack given.

I've filed a PR for this.