john-kurkowski / tldextract

Accurately separates a URL’s subdomain, domain, and public suffix, using the Public Suffix List (PSL).
BSD 3-Clause "New" or "Revised" License
1.81k stars 211 forks source link

FDQN Extraction error in some domains #281

Closed danruggi closed 1 year ago

danruggi commented 1 year ago

Error in extraction for

com.de and com.se are valid tld from the Public Domains List, but can't be recognized by the extractor;

Example:

import tldextract
tldextract.extract('http://forums.test123.com.de/')
ExtractResult(subdomain='forums.test123', domain='com', suffix='de')

A website that use this, ie: https://herbalife(dot)com(dot)se/ (hidden to avoid backlinks)

john-kurkowski commented 1 year ago

See this FAQ entry and this explanation of public vs. private domains.

danruggi commented 1 year ago

no issue, intended behaviour:

extractor = tldextract.TLDExtract(include_psl_private_domains=True)