Closed rrr2rrr closed 7 months ago
*.amazonaws.com.cn
entries are all in the private section of the PSL. See this FAQ entry. When I treat public and private suffixes the same, I get the result you want.
>>> import tldextract
>>> tldextract.extract("a.b.c.d.cn-north-1.airflow.amazonaws.com.cn", include_psl_private_domains=True)
ExtractResult(subdomain='a.b', domain='c', suffix='d.cn-north-1.airflow.amazonaws.com.cn', is_private=True)
Thanks!
According to https://publicsuffix.org/list/public_suffix_list.dat there are domains with wildcards.
It works correctly for
*.pg
but don't take into account collisions:Wildcard:
*.cn-north-1.airflow.amazonaws.com.cn
Domain:a.b.c.d.cn-north-1.airflow.amazonaws.com.cn
Expected:ExtractResult(subdomain='a.b', domain='c', suffix='d.cn-north-1.airflow.amazonaws.com.cn')
Result:ExtractResult(subdomain='a.b.c.d.cn-north-1.airflow', domain='amazonaws', suffix='com.cn')
because
com.cn
also presented as suffix You should first check wildcards, then sort suffixes by lengthHere is my PostgreSQL implementation https://stackoverflow.com/a/77544774/21920723