python-validators / validators

Python Data Validation for Humans™.
MIT License
977 stars 155 forks source link

Inconsistent behavior with rfc_1034=True for domain validation #345

Closed hwo411 closed 7 months ago

hwo411 commented 7 months ago

Hello! I've recently faced a strange behavior that I'd like to report.

Tested on the latest version as of writing - 0.24.0

As per specs, rfc_1034=True allows trailing dot in the end of url, but it seems to require it in most of the cases:

>>> from validators import domain
>>> domain('example.com')
True
>>> domain('example.ru')
True
>>> domain('example.com.')
ValidationError(func=domain, args={'value': 'example.com.'})
>>> domain('example.ru.')
ValidationError(func=domain, args={'value': 'example.ru.'})
>>> domain('example.com', rfc_1034=True)
True
>>> domain('example.ru', rfc_1034=True)
ValidationError(func=domain, args={'value': 'example.ru', 'rfc_1034': True})
>>> domain('example.com.', rfc_1034=True)
True
>>> domain('example.ru.', rfc_1034=True)
True

As you can see, without rfc_1034 domains without trailing dot are valid and with trailing dot is invalid, which is correct.

However, with rfc_1034 it produces strange behavior:

.com is valid both with and without trailing dot (which is correct as per my understanding, because rfc_1034 allows trailing dot, but not requires it

.ru (and other domains like .fr) though is valid only with trailing dot, which is incorrect behavior in my opinion.

I checked the code and it seems that here https://github.com/python-validators/validators/blob/master/src/validators/domain.py#L50 the dot should be optional. But it does not explain why .com domain is not affected by this problem.

Am I missing something and this is expected or is there a bug and what should be the correct behavior? (My assumption is that dot is optional in rfc_1034)

yozachar commented 7 months ago

rfc_1034 allows trailing dot, but not requires it.

True, but the current implementation

rf"[A-Za-z]{r'.$' if rfc_1034 else r'$'}",

requires a trailing dot, if rfc_1034=True

.com is valid both with and without trailing dot.

That is indeed strange. Thanks for brining this up!