john-kurkowski / tldextract

Accurately separates a URL’s subdomain, domain, and public suffix, using the Public Suffix List (PSL).
BSD 3-Clause "New" or "Revised" License
1.84k stars 212 forks source link

Incorrect Extraction for .it.com Domains #328

Open adam-sav opened 7 months ago

adam-sav commented 7 months ago

I've noticed an issue attempting to use tldextract.extract on domains like test.it.com and test.ru.com where it treats the actual sld of "test" as a subdomain and "it" and "ru" as the sld with a tld of "com". Upon looking at the list TLDExtract draws from, however, I see that both tlds are there. It would be helpful if a change was made that allowed for the tlds like it.com, ru.com, sa.com, and za.com to be interpreted correctly. Let me know if you have any questions about this issue. Thank you.

john-kurkowski commented 7 months ago

See this FAQ entry.

elliotwutingfeng commented 7 months ago

@john-kurkowski perhaps we should pin an issue linking to the "Public vs. private domains" section?

john-kurkowski commented 7 months ago

I like that idea! I think I've seen other projects do that, but never tried it myself.