john-kurkowski / tldextract

Accurately separates a URL’s subdomain, domain, and public suffix, using the Public Suffix List (PSL).
BSD 3-Clause "New" or "Revised" License
1.83k stars 210 forks source link

Add some missing reserved TLDs #330

Open hfz1337 opened 5 months ago

hfz1337 commented 5 months ago

It looks like the following TLDs (commonly used in local network hosts) are not in the list of TLDs (.tld_set_snapshot):

.local
.localdomain
.domain
.lan
.home
.corp

Edit: I just noticed the extra_suffixes kwarg in the TLDExtract class, so this may not be needed actually. Feel free to close this issue if you think the above TLDs shouldn't be considered by default.

john-kurkowski commented 5 months ago

I'll think about this.

On one hand, this library doesn't define any suffixes, it leaves those to the Public Suffix List, emphasis on "public." See this FAQ entry. The FAQ also mentions your extra_suffixes workaround, right.

On the other hand, the Public Suffix List is tracking a similar issue in https://github.com/publicsuffix/list/issues/1681. The upstream list may one day include the recommended private TLDs. In the context of this library, the recommended private TLDs do appear separate and reserved, like they could be treated differently, to distinguish them from an arbitrary private domain like www.localhost.asdfghjkl.