Closed kevinmarsh closed 1 year ago
I think the trie suffix detection in #285 in version 3.4.3 might have broken looking up uk.com private suffix (which is included in the bundled snapshot) https://github.com/john-kurkowski/tldextract/blob/6f45fed6c56f377e8a9a77ce43c50712281940d8/tldextract/.tld_set_snapshot#L10570
uk.com
Comparing 3.4.2:
>>> import tldextract >>> tldextract.__version__ '3.4.2' >>> extractor = tldextract.TLDExtract(include_psl_private_domains=True) >>> extractor("foo.uk.com") ExtractResult(subdomain='', domain='foo', suffix='uk.com')
to 3.4.3:
>>> import tldextract >>> tldextract.__version__ '3.4.3' >>> extractor = tldextract.TLDExtract(include_psl_private_domains=True) >>> extractor("foo.uk.com") ExtractResult(subdomain='foo', domain='uk', suffix='com')
you can see that the uk.com suffix is no longer recognized but instead thinks uk is the domain.
uk
Although weirdly just using the tldextract.extract wrapper function in both versions give the exact same (correct) results
tldextract.extract
>>> import tldextract >>> tldextract.extract("foo.uk.com", include_psl_private_domains=True) ExtractResult(subdomain='', domain='foo', suffix='uk.com')
I'm looking into this. /cc @elliotwutingfeng
Fixed in 3.4.4. Thanks for the detailed report! That really eased tracking down the bug.
I think the trie suffix detection in #285 in version 3.4.3 might have broken looking up
uk.com
private suffix (which is included in the bundled snapshot) https://github.com/john-kurkowski/tldextract/blob/6f45fed6c56f377e8a9a77ce43c50712281940d8/tldextract/.tld_set_snapshot#L10570Comparing 3.4.2:
to 3.4.3:
you can see that the
uk.com
suffix is no longer recognized but instead thinksuk
is the domain.Although weirdly just using the
tldextract.extract
wrapper function in both versions give the exact same (correct) results