john-kurkowski / tldextract

Accurately separates a URL’s subdomain, domain, and public suffix, using the Public Suffix List (PSL).
BSD 3-Clause "New" or "Revised" License
1.81k stars 211 forks source link

in-addr.arpa not extracting as expected #241

Closed smellyspice closed 2 years ago

smellyspice commented 2 years ago

I've seen this in two different modules now, not sure why this is:

Code:

print(tldextract.extract('subdomain.domain.com'))
print(tldextract.extract('8.8.8.8.in-addr.arpa'))

Result:

ExtractResult(subdomain='subdomain', domain='domain', suffix='com')
ExtractResult(subdomain='8.8.8', domain='8', suffix='in-addr.arpa')

I would have expected in-addr.arpa to return something like:

ExtractResult(subdomain='8.8.8.8', domain='in-addr', suffix='arpa')

What am I doing wrong? :)

john-kurkowski commented 2 years ago

in-addr.arpa is indeed a public suffix, according to the list. See also this project's FAQ.

(Interesting suffix. It's near the top of the list. It's been there for at least 15 years, according to the blame. An oldie!)