ypcrts / fqdn

RFC-compliant FQDN validation and manipulation for Python.
http://fqdn.readthedocs.io/
Mozilla Public License 2.0
30 stars 11 forks source link

TLD validation is not _quite_ correct #3

Closed Shados closed 4 years ago

Shados commented 6 years ago

https://github.com/guyhughes/fqdn/blob/1ad687d6cd1d74c5781f673194a744ff105e345c/fqdn/__init__.py#L28

The current regex precludes hyphens and digits in the TLD entirely, while the actual restriction in RFC 1035 is that all labels must start with a letter, and also cannot end with a hyphen. This restriction was relaxed in RFC 1123 to allow labels to start with numbers as well, but not for TLDs (somewhat clarified by RFC 3696).

RFC 1123 does kind of imply that TLDs should be all-alphabetic with the following:

However, a valid host name can never
have the dotted-decimal form #.#.#.#, since at least the
highest-level component label will be alphabetic

But does not actually state this outright. If you look at the way international domain names are encoded into ASCII -- including TLDs -- then it becomes evident that the restriction on TLD labels must instead be the original one: they must start with a letter, may include numbers and hyphens, and cannot end with a hyphen.

ypcrts commented 4 years ago

Yes that was a regex fail. Dashes and numbers were only allowed in the first label.

Thank you.