john-kurkowski / tldextract

Accurately separates a URL’s subdomain, domain, and public suffix, using the Public Suffix List (PSL).
BSD 3-Clause "New" or "Revised" License
1.81k stars 211 forks source link

IPv4 with non-ASCII dots #287

Closed elliotwutingfeng closed 1 year ago

elliotwutingfeng commented 1 year ago

Should IPv4 addresses with non-ASCII dots be accepted?

Example: http://127\u30020\uff610\u002e1/foo/bar

john-kurkowski commented 1 year ago

Interesting question! It's never come up. So I lean toward not? Are you encountering an issue?

On the other hand, implementing it wouldn't be a huge burden. It would be symmetrical with other dot-splitting code in the project.

elliotwutingfeng commented 1 year ago

Chrome and Firefox addressbars both automatically convert unicode dots to ascii dots for IPv4 addresses.

On a sidenote, for IPv6 addresses with a trailing IPv4 address, like [aBcD:ef01:2345:6789:aBcD:ef01:127.0。0。1], Chrome automatically converts the dots, but not Firefox.

john-kurkowski commented 1 year ago

Ok, I see the usability benefit. I'm open to a fix for this!