whatwg / url

URL Standard
https://url.spec.whatwg.org/
Other
526 stars 137 forks source link

Hexadecimal IPv4 hosts aren't parsed correctly #632

Closed martinthomson closed 3 years ago

martinthomson commented 3 years ago

Before #619, LDH hostnames would be parsed as IPv4 addresses. At a high level, if an input parsed successfully as an IPv4 address, it was an IPv4 address; otherwise, it was a domain name.

619 makes identifying IPv4 addresses easier. It exploits the fact that ICANN has effectively promised never to register a purely numeric TLD, using the last label to determine which is IPv4 and which is not. It only attempts IPv4 parsing if the last part of the address (ignoring any empty part) is purely numeric.

The problem is that a class of names that would formerly have been identified as IPv4 addresses are no longer identified thus. "0x7f000001" does not parse as the v4 loopback address under the new algorithm.

annevk commented 3 years ago

Can you explain why?

As far as I can tell https://url.spec.whatwg.org/#concept-host-parser calls https://url.spec.whatwg.org/#ends-in-a-number-checker which calls https://url.spec.whatwg.org/#ipv4-number-parser which does not return failure for that input. That eventually results in the host parser invoking https://url.spec.whatwg.org/#concept-ipv4-parser for the same input and returning the result.

martinthomson commented 3 years ago

My mistake. I saw the digits check and misunderstood its purpose.