Closed elliotwutingfeng closed 1 year ago
Thank you for the thorough report.
It can be fixed by using socket.inet_pton() in looks_like_ip() instead of socket.inet_aton(). However, it is only supported on Unix/Unix-Like/Windows systems. Some of these systems do not.
A more portable fix would be using ipaddress.IPv4Address, though it is much slower.
Maybe try socket.inet_pton
, and if it's unavailable for the system, fall back to ipaddress.IPv4Address
?
The following inputs are recognized as IPv4 addresses due to the use of socket.inet_aton().
1.1.1
-> domain parsed as1.1.1
1.1
-> domain parsed as1.1
1
-> domain parsed as1
(output is still correct nonetheless)The above is legacy behavior from UNIX's inet_aton for classful networks, a network addressing architecture made obsolete in 1993.
01.01.01.01
-> domain parsed as01.01.01.01
01.01.01
-> domain parsed as01.01.01
01.01
-> domain parsed as01.01
01
-> domain parsed as01
(output is still correct nonetheless)0x1.0x1.0x1.0x1
-> domain parsed as0x1.0x1.0x1.0x1
0x1.0x1.0x1
-> domain parsed as0x1.0x1.0x1
0x1.0x1
-> domain parsed as0x1.0x1
0x1
-> domain parsed as0x1
(output is still correct nonetheless)Given that tldextract's regex-based ipv4() function only recognizes IPv4 addresses with 4 decimal octets without zero padding, this is probably a bug.
It can be fixed by using socket.inet_pton() in looks_like_ip() instead of socket.inet_aton(). However, it is only supported on Unix/Unix-Like/Windows systems. Some of these systems do not.
A more portable fix would be using ipaddress.IPv4Address, though it is much slower.
If suffix_index == len(labels) == 4, are there any edge cases not covered by IP_RE?