john-kurkowski / tldextract

Accurately separates a URL’s subdomain, domain, and public suffix, using the Public Suffix List (PSL).
BSD 3-Clause "New" or "Revised" License
1.81k stars 211 forks source link

Extract assigns wrong top level domain (provate suffix value) to suffix parameter #319

Closed nerses0 closed 4 months ago

nerses0 commented 4 months ago

Hi, It seems that the extract method fails to assign proper public suffix value for URLs/domains that use public suffix from the list below // CentralNic : http://www.centralnic.com/names/domains // Submitted by registry gavin.brown@centralnic.com ae.org br.com cn.com com.de com.se de.com eu.com gb.net hu.net jp.net jpn.com mex.com ru.com sa.com se.net uk.com uk.net us.com za.bz za.com

Here are few examples

extract('http://anysubdomain.anydomain.za.com') ExtractResult(subdomain='anysubdomain.anydomain', domain='za', suffix='com', is_private=False) extract('http://anysubdomain.anydomain.gb.net') ExtractResult(subdomain='anysubdomain.anydomain', domain='gb', suffix='net', is_private=False) extract('http://anysubdomain.anydomain.us.com') ExtractResult(subdomain='anysubdomain.anydomain', domain='us', suffix='com', is_private=False) extract('http://anysubdomain.anydomain.ru.com') ExtractResult(subdomain='anysubdomain.anydomain', domain='ru', suffix='com', is_private=False)

expected output say for the last one is ru.com

elliotwutingfeng commented 4 months ago

You need to set include_psl_private_domains=True.

extract('http://anysubdomain.anydomain.za.com', include_psl_private_domains=True)
ExtractResult(subdomain='anysubdomain', domain='anydomain', suffix='za.com', is_private=True)

See https://github.com/john-kurkowski/tldextract/blob/master/README.md#public-vs-private-domains

nerses0 commented 4 months ago

Thanks! it works!