john-kurkowski / tldextract

Accurately separates a URL’s subdomain, domain, and public suffix, using the Public Suffix List (PSL).
BSD 3-Clause "New" or "Revised" License
1.81k stars 211 forks source link

Incorrectly parsing this exact domain: veterinaire.fr #261

Closed JonathanAquino-NextRoll closed 2 years ago

JonathanAquino-NextRoll commented 2 years ago

When I try to apply tldextract to veterinaire.fr, it says the domain is '' and the suffix is 'veterinaire.fr':

>>> print(tldextract.extract('veterinaire.fr'))
ExtractResult(subdomain='', domain='', suffix='veterinaire.fr')

Oddly, if I remove the last letter from veterinaire, it works:

>>> print(tldextract.extract('veterinair.fr'))
ExtractResult(subdomain='', domain='veterinair', suffix='fr')

And if I add 2 to veterinaire, it also works:

>>> print(tldextract.extract('veterinaire2.fr'))
ExtractResult(subdomain='', domain='veterinaire2', suffix='fr')
brycedrennan commented 2 years ago

Yes veterinaire.fr is a top-level domain according to the public suffix list