python-validators / validators

Python Data Validation for Humans™.
MIT License
958 stars 152 forks source link

Domain validator allows invalid characters in some cases #366

Closed hagenrd closed 4 months ago

hagenrd commented 5 months ago

It appears that, currently, any character is valid for the final character in the gTLD if rfc1034 is True, for example:

>>> domain('example.com?', rfc_1034=True, rfc_2782=False)
True
>>> domain('example.com!', rfc_1034=True, rfc_2782=False)
True

I believe the '.' just needs to be escaped in the pattern string (link):

+ rf"[a-z]{r'.?$' if rfc_1034 else r'$'}",
             ^

Also, it appears question marks are allowed when rfc_2782 is True for domain validation:

>>> from validators import domain
>>> domain('example?.com', rfc_1034=False, rfc_2782=True)
True

This appears to be from the use of '?' after the '_' inside of a character class:

rf"^(?:[a-z0-9{r'_?'if rfc_2782 else ''}]"
                  ^

Presumably, this is to make the '_' optional, but since metacharacters aren't active in character classes (link), this is interpreted as a literal '?' instead.

yozachar commented 5 months ago

Hey @hagenrd thanks! Do you mind reviewing: #367, before it's merged?

hagenrd commented 4 months ago

Thanks for the quick turnaround! Any chance you have some time to provide a patch that includes those changes?

yozachar commented 4 months ago

https://patch-diff.githubusercontent.com/raw/python-validators/validators/pull/367.patch ?

hagenrd commented 4 months ago

Sorry, I meant patch in terms of a semantic version, i.e. a patch-release (0.28.1).